% A program by Oege de Moor and Jeremy Gibbons % Sept 99 % % % | lines | chars | size(KB) | time(s) | % ---------------------+-------+-------+----------+----------+ % Haskell ghc 4.01 | 183 | 4676 | 453 | 4.72 | % Haskell hbc 0.9999.4 | 183 | 4676 | 196 | 10.34 | % C++ g++ 2.90.27 | 416 | 6310 | 8 | 0.43 | % As one of the authors of this paper, I'd like to echo Manuel's warning % about quoting out of context. The paper is about Haskell as a tool in % designing and presenting algorithms, not about performance. The Haskell % program was written for clarity, to explain a fairly tricky algorithm. The % figures are there to show that in spite of that clarity-first style, the % program is still suitable for sizable experiments. % % The sources (both Haskell and C++) are on my home page at % % http://www.comlab.ox.ac.uk/oucl/users/oege.demoor/homepage.htm %% created by ODM@15:00, August 5, 1995 %% modified by ODM@15:00, April 16, 1996 %% more modifications JG, 10/6/97, 13/6/97, 16/6/97, ..., 2/7/97 %% rearranged JG 14/7/98, 16/11/98, 3/12/98 %% minor changes, make suitable for compilation by ghc/hbc, new test results ODM, 3/2/99 \documentclass[12pt]{article} \usepackage{a4} \begin{document} \newif\ifskipspaces \def\marginnote#1{% \ifvmode \leavevmode \skipspacestrue \else \skipspacesfalse \fi \domarginnote{#1}% \ifskipspaces \ignorespaces \fi} \iftrue \def\domarginnote#1{\marginpar{\scriptsize\raggedright\hspace{0pt}#1}} \else \def\domarginnote#1{} \fi \makeatletter % "code" environment is like verbatim, but with negative vertical space % before and after (so the verbatim material can start and end with a blank % line, which doesn't appear in the printed result). \begingroup \catcode `|=0 \catcode `[= 1 \catcode`]=2 \catcode `\{=12 \catcode `\}=12 \catcode`\\=12 |gdef|@xcode#1\end{mcode}[#1|end[mcode]] |endgroup \def\mcode{\unskip \vspace{-\baselineskip}% \@verbatim \frenchspacing\@vobeyspaces \@xcode} \def\endmcode{\if@newlist \leavevmode\fi\endtrivlist\vspace{-\baselineskip}} \makeatother \def\implies{\Rightarrow} \def\iff{\Leftrightarrow} \title{Bridging the Algorithm Gap: \\ A Linear-time Functional Program \\ for Paragraph Formatting} \author{ \small\begin{tabular}[t]{l} {\large Oege de Moor}\\ Programming Research Group \\ Oxford University\\ % Wolfson Building \\ % Parks Road \\ % Oxford OX1 3QD, UK \texttt{oege@comlab.ox.ac.uk} \end{tabular} \and \small\begin{tabular}[t]{l} {\large Jeremy Gibbons}\\ School of Computing \& Math.\ Sciences \\ Oxford Brookes University\\ % Gipsy Lane \\ % Headington \\ % Oxford OX3 0BP, UK \texttt{jgibbons@brookes.ac.uk} \end{tabular} } \date{\today} \maketitle \begin{abstract} \noindent In the constructive programming community it is commonplace to see formal developments of textbook algorithms. In the algorithm design community, on the other hand, it may be well known that the textbook solution to a problem is not the most efficient possible. However, in presenting the more efficient solution, the algorithm designer will usually omit some of the implementation details, thus creating an \emph{algorithm gap} between the abstract algorithm and its concrete implementation. This is in contrast to the formal development, which usually proceeds all the way to the complete concrete implementation of the less efficient solution. We claim that the algorithm designer is forced to omit some of the details by the relative expressive poverty of the Pascal-like languages typically used to present the solution. The greater expressiveness provided by a functional language would allow the whole story to be told in a reasonable amount of space. In this paper we use a functional language to present the development of a sophisticated algorithm all the way to the final code. We hope to bridge the algorithm gap between abstract and concrete implementations, and thereby facilitate communication between the two communities. \end{abstract} {\renewcommand{\thefootnote}{\fnsymbol{footnote}} \footnotetext{This paper is a revision of Technical Report CMS-TR-97-03 from the School of Computing and Mathematical Sciences, Oxford Brookes University. %It is currently being submitted for publication. }} \section{Introduction} The paragraph formatting problem \cite{knuth81} is a favourite example for demonstrating the effectiveness of formal methods. Two particularly convincing derivations can be found in \cite{bird86} and \cite{morgan94}. The algorithms derived in these references are applications of dynamic programming, and their time complexity is $O(min(w n,n^2))$, where $w$ is the maximum number of words on a line, and $n$ is the number of words to be formatted. Among algorithm designers it is well-known that one can solve the paragraph problem in $O(n)$ time, independent of $w$ \cite{eppstein92b,galil89,hirschberg87a,hirschberg87b}. Typical presentations of these linear algorithms do however ignore some details (for instance, that the white space on the last line does not count), thus creating an \emph{algorithm gap} between the abstract algorithm and its concrete implementation. This contrasts with the formal developments cited above, which present concrete code, albeit for a less efficient solution. This paper is an experiment in bringing formal methods and algorithm design closer together, through the medium of functional programming. It presents a linear-time algorithm for paragraph formatting in a semi-formal style. The algorithm is given as executable code; in fact, the \LaTeX\ file used to produce this paper is also an executable \emph{Haskell} program. In writing this paper we hope to convince the reader that a functional style can be suitable for communicating non-trivial algorithms, without sacrificing rigour or clarity or introducing an algorithm gap. Of course, there exist algorithms that are difficult to express functionally. In fact, it came as a surprise to us that the algorithm presented here can indeed be implemented without resorting to any imperative language features. One of us (OdM) first attempted to explain the algorithm (which is very similar to that in \cite{galil89}) in 1992, and then implemented it in Modula-2; when that program was discussed at the Oxford \emph{Problem Solving Club}, it met with a lot of disappointment because of the need for destructive updates. Only recently we realised how the algorithm could be expressed functionally. This paper is therefore also a contribution to the ongoing effort of determining what can be done efficiently in a purely functional style. \iffalse \begin{mcode} >module Main where >import Char >import IO >import System \end{mcode} \fi \subsection{Preliminaries} We do not assume that the reader is an expert in functional programming; we explain the necessary syntax and standard functions of Haskell as we go. (Of course, some familiarity with a modern lazy functional language such as Haskell, Miranda\footnote{Miranda is a trademark of Research Software Ltd.} or Hope would be helpful; but we trust that the notation is fairly self-evident.) In addition to the standard functions of Haskell, we use a number of other primitives, which are explained in this section. \begin{description} \item[fold1:] The function \texttt{fold1} is related to the standard function \texttt{foldr}, but it operates on non-empty lists. Informally, we have \begin{eqnarray*} \hbox to 0.5in{$\verb"fold1 step start [a0,a1,...,an]"$\hss} \\ &=& \verb"a0 `step` (a1 `step` (... `step` start an))" \end{eqnarray*} (Here, `\verb"[a0,a1,...,an]"' denotes a list, and writing the binary function \texttt{f} inside backwards quotes, \verb"`f`", allows it to be used as an infix operator.) In words, \texttt{fold1} traverses a list from right to left, applying \texttt{start} to the last element, and `adding in' the next element at each stage using the function \texttt{step}. The formal definition of \texttt{fold1} is \begin{mcode} >fold1 :: (a->b->b) -> (a->b) -> [a] -> b >fold1 f g [a] = g a >fold1 f g (a:x) = f a (fold1 f g x) \end{mcode} (The first line is a \emph{type declaration}, stating that \verb"fold1" takes three arguments~--- a binary function of type \verb"a->b->b", a unary function of type \verb"a->b", and a list of type \verb"[a]"~--- and returns a value of type \verb"b". The binary operator `\verb":"' is `cons', prepending an element onto a list. The definition of \verb"fold1" is by \emph{pattern matching}: the first equation that matches the argument applies. Because neither equation matches the empty list, \verb"fold1" is undefined there.) \item[scan1:] The function \texttt{scan1} records all intermediate results of the computation of \texttt{fold1} in a list: \begin{mcode} >scan1 :: (a->b->b) -> (a->b) -> [a] -> [b] >scan1 f g = fold1 f' g' > where g' a = [g a] > f' a s = f a (head s) : s \end{mcode} (Here, the function \texttt{head} returns the first element of a non-empty list.) For example, the function \texttt{tails}, which returns all non-empty suffixes of its argument, can be defined \begin{mcode} >tails :: [a] -> [[a]] >tails = scan1 (:) (:[]) \end{mcode} (The second argument \verb"(:[])" to \texttt{scan1} is the binary operator `\texttt{:}' already supplied with its second argument, in this case the empty list; the function \verb"(:[])" that results is the function taking an element into a singleton list containing just that element.) The relationship between \texttt{fold1} and \texttt{scan1} is succinctly expressed in the so-called \emph{scan lemma}: \begin{eqnarray*} \verb"scan1 f g" &=& \verb"map (fold1 f g) . tails" \end{eqnarray*} (Here, the higher-order function \texttt{map} applies the function which is its first argument to every element of the list which is its second argument; the binary operator `\verb"."' is function composition.) The scan lemma will be useful towards the end of this paper. \item[single:] The operator \texttt{single} tests whether its argument is a singleton list: \begin{mcode} >single :: [a] -> Bool >single [a] = True >single _ = False \end{mcode} (The pattern `\verb"_"' matches any argument, thereby acting as a kind of `else' clause.) It is thus similar to the standard Haskell function \texttt{null}, which tests for emptiness. \item[minWith:] The function \texttt{minWith f} takes a list \texttt{x} and returns an element of \texttt{x} whose \texttt{f}-value is minimal. Formally, \texttt{minWith} can be defined in terms of \texttt{fold1}: \begin{mcode} >minWith :: (a->Int) -> [a] -> a >minWith f = fold1 choice id > where choice a b | f a < f b = a > | otherwise = b \end{mcode} (The expression `\verb"f a <= f b"' here is a \emph{guard}. The first equation for \verb"choice" applies if the guard holds, and the second equation applies if it does not.) \end{description} \section{Specifying the problem} \label{sec:spec} In the paragraph problem, the aim is to lay out a given text as a paragraph in a visually pleasing way. A text is given as a list of words, each of which is a string, that is, a sequence of characters: \begin{mcode} >type Txt = [Word] >type Word = String \end{mcode} A paragraph is a sequence of lines, each of which is a sequence of words: \begin{mcode} >type Paragraph = [Line] >type Line = [Word] \end{mcode} The problem can be specified as \begin{mcode} >par0 :: Txt -> Paragraph >par0 = minWith cost . filter feasible . formats \end{mcode} or informally, to compute the minimum-cost format among all feasible formats. (The function \verb"filter p" takes a list \texttt{x} and returns exactly those elements of \texttt{x} that satisfy the predicate \texttt{p}.) In the remainder of this section we formalise the three components of this specification. The result will be an executable program, but one whose execution takes exponential time. \subsection{Paragraph formats} The function \texttt{formats} takes a text and returns all possible formats as a list of paragraphs: \begin{mcode} >formats :: Txt -> [Paragraph] >formats = fold1 next_word last_word > where last_word w = [ [[w]] ] > next_word w ps = map (new w) ps ++ map (glue w) ps >new w ls = [w]:ls >glue w (l:ls) = (w:l):ls \end{mcode} (Here, the binary operator \verb"++" is list concatenation.) That is, for the last word alone there is just one possible format, and for each remaining word we have the option either of putting it on a new line at the beginning of an existing paragraph, or of gluing it onto the front of the first line of an existing paragraph. \subsection{Feasible paragraph formats} A paragraph format is feasible if every line fits: \begin{mcode} >feasible :: Paragraph -> Bool >feasible = all fits \end{mcode} (The predicate \verb"all p" holds of a list precisely when all elements of the list satisfy the predicate \texttt{p}.) We define a global constant \texttt{maxw} for the maximum line width: \begin{mcode} >maxw :: Int >maxw = 70 \end{mcode} Of course, for flexibility many of the functions we develop should be parameterized by \texttt{maxw}; however, a global constant makes the presentation of the algorithm less cluttered. It is straightforward to make \texttt{maxw} a parameter throughout. A line \texttt{fits} if its width is at most the maximum line width: \begin{mcode} >fits :: Line -> Bool >fits xs = (width xs <= maxw) \end{mcode} In formatting a text as a paragraph, the maximum line width should never be exceeded. Our programs will halt with a run-time error if an individual word exceeds the maximum line width; in practical applications one would need to deal with such pathological inputs more graciously. The \texttt{width} of a line is defined to be its total length when the words are printed next to each other, with one space character between each consecutive pair of words: \begin{mcode} >width :: Line -> Int >width = fold1 plus length > where plus w n = length w + 1 + n \end{mcode} The function \texttt{length} returns the number of elements in a list. This notion of width is appropriate for displaying paragraphs on a device where every character has the same width. The programs presented below can easily be modified to deal with proportional spacing, where different characters may have different widths. In that case, however, one will need constant-time access arrays, which are not available in all functional languages. \subsection{The cost of a paragraph format} In addition to the maximum line width \texttt{maxw}, the problem also depends upon an optimum line width \texttt{optw}, another global constant: \begin{mcode} >optw :: Int >optw = 63 \end{mcode} The optimum line width should of course be at most the maximum line width. A \emph{visually pleasing} paragraph is one in which the width of each line in the paragraph is as close to the optimum line width as possible. More precisely, we wish to minimise the total \emph{cost}, the sum of the squares of the deviations of the line widths from the optimum line width: \begin{mcode} >cost :: Paragraph -> Int >cost = fold1 plus (const 0) > where plus l n = linc l + n > linc l = (optw - width l)^2 \end{mcode} Note that the last line gets treated as a special case: it does not count towards the total cost of the paragraph. This is achieved by the function \texttt{const}, which satisfies the equation \verb"const a b = a" for any \texttt{b}. \section{The standard algorithm} \label{sec:standard} The specification in Section~\ref{sec:spec} makes an inefficient program because it maintains too many candidate solutions. It is a \emph{generate and test} algorithm, of the form \begin{verbatim} best . filter ok . generate \end{verbatim} The standard technique for improving algorithms of this form is to \emph{promote} the test \verb"filter ok" and the selection \verb"best" inside the generator, so as to avoid generating unnecessary candidates. Standard theory \cite{bird86,bird90,bird93d} tells us what properties of \verb"generate", \verb"ok" and \verb"best" this requires. In our particular case it is sufficient that \texttt{new} is monotonic in its second argument: \begin{eqnarray} \verb"cost ls" \le \verb"cost ls'" &\implies& \verb"cost (new w ls)" \le \verb"cost (new w ls')" \label{eqn:new-monotonic} \end{eqnarray} and furthermore, that \texttt{glue} is monotonic in its second argument in a slightly weaker sense~--- namely, for paragraphs with the same first line: \begin{eqnarray} \hbox to 0.5in{$\verb"cost (l:ls)" \le \verb"cost (l:ls')"$\hss} \nonumber\\ &\implies& \verb"cost (glue w (l:ls))" \le \verb"cost (glue w (l:ls'))" \label{eqn:glue-monotonic} \end{eqnarray} Note that neither of these two properties depends on the precise definition of the cost of individual lines, as returned by the function \texttt{linc}; all that matters is that the total cost is the sum of the costs of the individual lines. Using the above two monotonicity properties, the standard theory alluded to above concludes that the following dynamic programming algorithm is a valid solution to the specification \texttt{par0}. This is a well-rehearsed development, so we do not repeat it here. \begin{mcode} >par1 > = minWith cost . fold1 step start > where > step w ps = filter fitH (new w (minWith cost ps):map (glue w) ps) > start w = filter fitH [ [[w]] ] >fitH = fits . head \end{mcode} Note that \texttt{par1} is not equal to \texttt{par0}; in particular, if there is more than one optimal paragraph, \texttt{par0} and \texttt{par1} may return different (but still optimal) paragraphs. To express this refinement property formally we would have to generalize from functional programming to relational programming~\cite{bdm96}, a step that is beyond the scope of this paper. For efficiency, we need to perform a \emph{tupling transformation} \cite{pettorossi84,chin90} to avoid recomputing the width of the first line and the cost of the remaining candidate solutions. We represent the paragraph \texttt{(l:ls)} by the triple \begin{verbatim} (l:ls, width l, cost ls) \end{verbatim} (Because \texttt{ls} may be empty, we stipulate also that \verb"cost [] = 0".) The program resulting from this data refinement is as follows. \begin{mcode} >par1' > = the . minWith cost . fold1 step start > where > step w ps = filter fitH (new w (minWith cost ps):map (glue w) ps) > start w = filter fitH [([[w]], length w,0)] > new w ([l],n,0) = ([w]:[l], length w, 0) > new w p = ([w]:ls , length w, cost p) where (ls,n,m) = p > glue w (l:ls,n,m) = ((w:l):ls, length w + 1 + n, m) > the (ls,n,m) = ls > width_hd (ls,n,m) = n > cost_tl (ls,n,m) = m > linc_hd p = (optw - width_hd p)^2 > cost ([_],_,_) = 0 > cost p = linc_hd p + cost_tl p > fitH p = width_hd p <= maxw \end{mcode} \section{Improving the standard algorithm} The algorithm at the end of Section~\ref{sec:standard} is the standard dynamic programming solution to the paragraph problem, and its time complexity is $O(min(w n, n^2))$ where $w$ is the maximum number of words on a line and $n$ is the number of words in the paragraph. Computational experiments confirm that this is an accurate estimate of the program's behaviour. This algorithm is the final product of the typical textbook derivation. It does not make use of any special properties of \texttt{linc}, the function that returns the cost on an individual line. Indeed, if nothing more is known about \texttt{linc}, it is not possible to improve upon this algorithm, as noted in \cite{hirschberg87a}. However, as we shall show in Sections~\ref{sec:dominance} to~\ref{sec:differencing}, for our particular choice of \texttt{linc}, namely \begin{verbatim} linc l = (optw - width l)^2 \end{verbatim} substantial improvements are possible; in fact, we will end up with a program linear in $n$ and independent of $w$. In Section~\ref{sec:dominance} we determine a \emph{dominance criterion}, whereby some candidate solutions can be discarded because they are dominated by other `better' solutions. Therefore, after processing each word of the paragraph we may `trim' the collection of candidate solutions to remove the dominated ones. To obtain a linear-time solution we can afford to do at most an amortized constant amount of work for each word in the paragraph. Precisely one new candidate solution is added for each word, so it suffices that the amount of work performed with each word is proportional to the number of candidate solutions discarded. The obvious implementation of the trimming operation introduced in Section~\ref{sec:dominance} involves reconsidering every candidate solution for each word in the paragraph. In Section~\ref{sec:forecasting} we show that this is not necessary: trimming in fact results in removing some candidate solutions from the beginning of the list and some others from the end. Crucially, the solutions in the middle of the list need not be considered; at the beginning and at the end of the list, as soon as one undominated solution is found the trimming can stop. That still leaves the \verb"filter" and the \verb"map" in the definition of the function \verb"step", each of which traverses all candidate solutions for each word of the paragraph. In Section~\ref{sec:filtering} we replace the \verb"filter" with a function that performs work proportional to the number of solutions discarded, independently of the number of solutions retained. In Section~\ref{sec:differencing} we eliminate the \verb"map (glue w)", by making a change of representation under which \verb"glue w" is the identity function. The resulting algorithm is linear in the paragraph length and independent of the line width. \section{Dominance criteria} \label{sec:dominance} In this section we determine a \emph{dominance criterion} whereby some paragraph formats can be discarded because they are dominated by another; thus, fewer candidate solutions need be maintained at each step. Dominance criteria are the basis for most improvements over straightforward dynamic programming. In our case, the dominance criterion is a consequence of the fact that the function \texttt{linc} is \emph{concave}, in the sense that \marginnote{label more equations, and use in cross-refs} \begin{eqnarray*} \verb"linc (l++m) - linc l" &\le& \verb"linc (k++l++m) - linc (k++l)" \end{eqnarray*} Consequently, the monotonicity property of \texttt{glue} (Equation~\ref{eqn:glue-monotonic}) can be strengthened to: \begin{eqnarray} \hbox to 0.5in{$\verb"cost (l:ls)" \le \verb"cost ((l++m):ms)"$\hss} \nonumber\\ &\implies& \verb"cost (glue w (l:ls))" \le \verb"cost (glue w ((l++m):ls))" \label{eqn:glue-monotonic-stronger} \end{eqnarray} In words, the concavity property says that appending a line \verb"m" onto a line \verb"l" has no worse an effect than appending \verb"m" onto a longer line \verb"k++l", and the monotonicity property says that if a paragraph with a shorter first line is better than another paragraph, it will remain better when more words are glued to the first lines of both paragraphs~--- a cheaper paragraph with a shorter first line dominates a costlier paragraph with a longer first line. \subsection{Exploiting concavity} \label{sec:concavity} We can exploit the dominance criterion to arrive at an improved definition of \texttt{step}. Note that \verb"step w" maintains the property `is in strictly increasing order of length of first line' of the list of candidate solutions. Now suppose that we have two formats \texttt{p} and \texttt{q}, in that order, in the list of candidate solutions; the first line of \texttt{q} is longer than the first line of~\texttt{p}. Suppose also that $\verb"cost p" \le \verb"cost q"$. Then \verb"p" dominates \verb"q": by the monotonicity property of \texttt{new} (Equation~\ref{eqn:new-monotonic}) and the stronger property of \texttt{glue} (Equation~\ref{eqn:glue-monotonic-stronger}), it follows that \texttt{q} may be safely discarded, because any candidate solution generated from \texttt{q} will always be beaten by the candidate solution generated in the same way from \texttt{p}. So we may improve the definition of \texttt{step} to \begin{verbatim} step w ps = trim (filter fitH (new w (minWith cost ps):map (glue w) ps)) \end{verbatim} where the function \texttt{trim} discards the dominated candidate solutions (namely, the formats~\verb"q" for which there is a format~\verb"p" appearing earlier in the collection with $\verb"cost p" \le \verb"cost q"$): \begin{verbatim} trim [] = [] trim [p] = [p] trim (ps++[p,q]) | cost p <= cost q = trim (ps++[p]) | otherwise = trim (ps++[p]) ++ [q] \end{verbatim} This is not a valid definition in Haskell, because patterns involving \verb"++" are not allowed. However, the pattern-matching is easily re-expressed in terms of the standard functions \texttt{last} and \texttt{init}, which return the last element and the remainder of a list, respectively; we omit the details. \iffalse \begin{mcode} >par2' > = minWith cost . fold1 step start > where > step w ps = trim (filter fitH (new w (minWith cost ps):map (glue w) ps)) > start w = filter fitH [ [[w]] ] > trim [] = [] > trim [p] = [p] > trim pspq > | cost p <= cost q = trim psp > | otherwise = trim psp ++ [q] > where q = last pspq > psp = init pspq > p = last psp > ps = init psp \end{mcode} \fi Note that we could trim from the back, as here, or from the front. In Section~\ref{sec:forecasting} we will see that trimming from the back is the better choice, because we will develop a criterion for stopping trimming early. \subsection{Trimming introduces order} \label{sec:trim-order} Now note further that the result of a \texttt{trim} is in strictly decreasing order of cost, so the cheapest candidate solution is the last in the list. We can therefore improve the definition of \texttt{step} further, by using \verb"last" instead of \verb"minWith cost": \begin{verbatim} step w ps = trim (filter fitH (new w (last ps):map (glue w) ps)) \end{verbatim} The resulting algorithm is an improvement over the standard solution, but it is still not linear because at each step the whole list of intermediate solutions is traversed. \section{Forecasting the future} \label{sec:forecasting} To remedy this inefficiency we will develop a criterion for stopping trimming before having traversed the whole list of candidate solutions. Observe that we maintain a list of paragraph formats with strictly increasing first line lengths, and strictly decreasing costs. Say that a candidate solution element is \emph{bumped} by its predecessor when it is eliminated in the computation of \texttt{trim}. Can we forecast how much gluing is needed before a particular format \texttt{p} is bumped by its predecessor? If so, we may be able to stop trimming early: if a word shorter than the length forecasted for \texttt{p} has been glued, then \texttt{p} need not be considered for trimming. \subsection{The bump factor} We introduce a function \verb"cg :: Paragraph -> Int -> Int" (for `cost-glue') such that \marginnote{wouldn't it be simpler with a \texttt{cg'} such that \texttt{cg'~p~n = cg~p~(n+1)}?} \begin{verbatim} cg p (length w + 1) = cost (glue w p) \end{verbatim} One suitable definition of \texttt{cg} is \begin{verbatim} cg [l] n = 0 cg (l:ls) n = (optw - (n + width l))^2 + cost ls \end{verbatim} In words, \texttt{cg p n} is the total cost of the paragraph formed after a sequence of words whose width is \texttt{n} has been glued to the first line of paragraph \texttt{p}. Note that we do not check whether the maximum line width is exceeded, and so the notion of cost may be meaningless in terms of paragraphs. We allow negative values of \texttt{n} as arguments of \verb|cg|. Using the function \texttt{cg}, we can forecast when a paragraph \texttt{p} will bump a paragraph \texttt{q} in the process of gluing more words to both paragraphs. Define the function \texttt{bf} (for `bump factor') by \begin{verbatim} bf p q = setmin {n | cg p n <= cg q n} `min` (maxw - width (head q) + 1) \end{verbatim} (This is not a Haskell definition ---~Haskell does not have sets, and besides, the quantification is over a potentially infinite set~--- but it does completely determine \texttt{bf}.) After gluing a width of \texttt{bf p q} to both \texttt{p} and \texttt{q}, paragraph \texttt{p} will be at least as good as \texttt{q}; furthermore, if we glue more than \texttt{bf p q}, paragraph \texttt{p} will still be as good as \texttt{q} (by concavity), so \texttt{q} can be discarded. The second term in the definition of \verb|bf| reflects the fact that after gluing \verb|maxw - width (head q) + 1|, paragraph \verb|p| is always better than paragraph \verb|q|, because the first line of \verb|q| exceeds the maximum line width. It can happen that \verb|bf| returns a negative number, namely when \verb|p| is better than \verb|q| to start with. \subsection{Using the bump factor} \label{sec:using-bump-factor} Let us now formalise these observations as properties of \texttt{glue}. Starting with \texttt{cg}, we have that \begin{verbatim} cg (glue w p) n = cg p (n + 1 + length w) \end{verbatim} Consequently, \texttt{bf} satisfies \begin{verbatim} bf (glue w p) (glue w q) = bf p q - (1 + length w) \end{verbatim} and therefore \begin{eqnarray} \hbox to 0.5in{$\verb"bf p q" \;<\; \verb"bf r s"$\hss} \nonumber\\ &\iff& \verb"bf (glue w p) (glue w q)" \;<\; \verb"bf (glue w r) (glue w s)" \label{eqn:glue-respects-bf} \end{eqnarray} Finally, define the predicate \texttt{better} by \marginnote{`$\mathtt{better\;w\;p\;q}$' is equivalent to `$\mathtt{1{+}\#w \ge bf\,p\,q}$'} \begin{verbatim} better w p q = cost (glue w p) <= cost (glue w q) || not (fitH (glue w q)) \end{verbatim} (The binary operator \verb"||" is boolean disjunction; later on we use \verb"&&", which is boolean conjunction.) In words, \verb"better w p q" states that, after gluing a word \texttt{w}, paragraph \texttt{p} will be better than paragraph \texttt{q}, either on grounds of cost or because the first line of \texttt{q} has become too long. Suppose \verb|p = l0:ls0|, \verb|q = (l0++l1):ls1|, \verb|r = (l0++l1++l2):ls2|, and $\verb"bf p q" \le \verb"bf q r"$. We have the following property: \begin{eqnarray*} \verb"better w q r" &\implies& \verb"better w p q" \;\land\; \verb"better w p r" \end{eqnarray*} In words, this property says that whenever \verb|q| gets better than \verb|r| by gluing a word \verb|w|, then \verb|p| is better than both \verb|q| and \verb|r| after gluing the same word. It follows that \verb|q| can never be useful, and therefore, if we had a triple like \verb|p|, \verb|q| and \verb|r| in the list of candidate solutions, \verb|q| could be safely discarded. \subsection{Tight lists of solutions} We shall exploit this observation by maintaining the list of candidate solutions so that, as well as \begin{enumerate} \item \label{property:width} the lengths of the first lines are in strictly increasing order, and \item \label{property:cost} the costs of the candidate solutions are in strictly decreasing order, \end{enumerate} as before, also \begin{enumerate} \setcounter{enumi}{2} \item \label{property:bf} the \texttt{bf}-values of consecutive adjacent pairs of candidate solutions are in strictly decreasing order. \end{enumerate} Such a list of candidate solutions we call \emph{tight}. To see how we can keep the list of candidate solutions tight, recall the definition of \verb|step| from Section~\ref{sec:trim-order}: \begin{verbatim} step w ps = trim (filter fitH (new w (last ps):map (glue w) ps)) \end{verbatim} We argued in Section~\ref{sec:concavity} that \verb"step w" maintains properties~\ref{property:width} and~\ref{property:cost}. We now show how the third property is maintained. Notice that all three properties are preserved when taking a subsequence of (that is, deleting some elements from) the list of candidate solutions. For properties~\ref{property:width} and~\ref{property:cost} this is obvious. For property~\ref{property:bf} it follows from the fact that $\verb"bf p q" \ge \verb"bf p r" \ge \verb"bf q r"$, provided that $\verb"bf p q" > \verb"bf q r"$ and \texttt{p}, \texttt{q} and \texttt{r} are in strictly increasing order of length of first line; \marginnote{I have a marvellous demonstration that this margin is too narrow to hold\ldots} this fact is not too hard to establish. Suppose that \verb|ps| satisfies property~\ref{property:bf}. Because \verb|glue| respects the \verb|bf| order (Equation~\ref{eqn:glue-respects-bf}), the list \verb|map (glue w) ps| also satisfies property~\ref{property:bf}. It is however not necessarily the case that \verb"new w (last ps) : map (glue w) ps" satisfies property~\ref{property:bf}: the presence of the new element at the beginning may require some of the initial elements of \verb|map (glue w) ps| to be removed. So, the cons operator \verb|(:)| in the definition of \verb"step" is replaced by a \emph{smart constructor} \verb|add|, which does the required pruning~--- by the argument at the end of Section~\ref{sec:using-bump-factor}, if we have three consecutive candidate solutions \texttt{p}, \texttt{q} and \texttt{r} with $\verb"bf p q" \le \verb"bf q r"$, solution \texttt{q} can be discarded. \begin{verbatim} add p [] = [p] add p [q] = [p,q] add p ([q,r]++rs) | bf p q <= bf q r = add p ([r]++rs) | otherwise = [p,q,r]++rs \end{verbatim} Now the list of candidate solutions \verb"new w (last ps) `add` map (glue w) ps" satisfies property~\ref{property:bf}. Note that \verb"p `add` ps" is a subsequence of \verb"p:ps", so properties~\ref{property:width} and~\ref{property:cost} are still maintained; for the same reason, all three properties are maintained by the \verb"filter fitH" too. Now, however, property~\ref{property:bf} permits an optimization of \texttt{trim}, whereby we can stop trimming early. This was the reason for introducing bump factors in the first place. Suppose that \texttt{ps} satisfies property~\ref{property:bf}; we claim that \verb"trim ps" can be written \begin{verbatim} trim [] = [] trim [p] = [p] trim (ps++[p,q]) | cost p <= cost q = trim (ps++[p]) | otherwise = ps++[p,q] \end{verbatim} without a recursive call in the last clause. Here is the justification. Suppose that \texttt{r},~\texttt{s} are adjacent candidate solutions in \texttt{ps}, and \texttt{p},~\texttt{q} are adjacent candidate solutions in \texttt{ps}, and \texttt{r} is before \texttt{p} in the list. Then $\verb"bf r s" > \verb"bf p q"$, by property~\ref{property:bf}. Suppose also that $\verb"cost p" > \verb"cost q"$. Then the word \texttt{w} that has just been processed was too short for \texttt{p} to bump \texttt{q}, and so was also too short for \texttt{r} to bump \texttt{s} (because of the \texttt{bf} ordering); that is, $\verb"cost r" > \verb"cost s"$ too, and the initial segment of the list of candidate solutions ending with solution \texttt{q} already satisfies property~\ref{property:cost}. Thus, we have \verb"trim (ps++[p,q]) = ps++[p,q]"~--- the second recursive call to \texttt{trim} can be omitted when \verb"ps++[p,q]" satisfies property~\ref{property:bf} and $\verb"cost p" > \verb"cost q"$. Because we are now manipulating the list of candidate solutions at both ends, it will be profitable to use a symmetric set of list operations, where \texttt{head} and \texttt{last} are equally efficient. Such an implementation of lists is summarized in an appendix to this paper. Below, whenever we use symmetric lists, the familiar list operations are written using a dash~--- \texttt{head'}, \texttt{init'}, and so on~--- and the type of symmetric lists over \texttt{a} is written \verb"SymList a". In outline, the program now is \begin{verbatim} par2 = last . fold1 step start step w ps = trim (filter fitH (new w (last ps) `add` map (glue w) ps)) \end{verbatim} \iffalse \begin{mcode} >par2 > = last . fold1 step start > where > step w ps = trim(filter fitH (new w (last ps) `add` map (glue w) ps)) > start w = filter fitH [ [[w]] ] > add p [] = [p] > add p [q] = [p,q] > add p (q:r:rs) | bf p q <= bf q r = add p (r:rs) > | otherwise = p:q:r:rs > bf p q > | single q && cost pt == 0 > = (optw - wph) `min` rqh > | single q = rqh > | otherwise = ceildiv (cost p - cost q) (2*(wqh-wph)) `min` rqh > where ph:pt = p > qh:qt = q > wph = width ph > wqh = width qh > rqh = maxw - wqh + 1 > ceildiv n m = (n+m-1) `div` m > trim [] = [] > trim [p] = [p] > trim pspq | cost p <= cost q = trim psp > | otherwise = pspq > where q = last pspq > psp = init pspq > p = last psp > ps = init psp \end{mcode} \fi (Note that we have made use again of the fact that a tight list of paragraphs is in strictly decreasing order of cost, replacing the \verb"minWith cost" in \verb"par2" by \verb"last".) This new program is in fact quite efficient: computational experiments show that at each step of the computation, only a very small number of candidate solutions are kept. Still, all candidate solutions get inspected each time \texttt{step} is evaluated, and this remains a source of inefficiency. To make progress, we shall have to remove the subexpressions \texttt{filter fitH} and \texttt{map (glue w)} from the definition of \texttt{step}; we do this in Sections~\ref{sec:filtering} and~\ref{sec:differencing}. \subsection{Computing the bump factor} One point that we have not touched upon is how \texttt{bf} can be efficiently implemented. This is an exercise in high-school algebra. Note that, when \texttt{p} and \texttt{q} appear in that order in the list of candidate solutions, \texttt{p} cannot be a singleton: there is just one way of formatting a paragraph into a single line, and if that line fits it will be the last candidate solution because it has the longest first line. Therefore there are just two cases to consider in computing \verb"bf p q": when \texttt{q} is a singleton, and when \texttt{q} is not. Recall the definition of \verb"bf": \begin{eqnarray*} \hbox to 2em{\rlap{\texttt{bf p q}}} \\ &=& \verb"setmin {n | cg p n <= cg q n} `min` (maxw - width (head q) + 1)" \end{eqnarray*} Note that the second term $\verb"rqh" = \verb"maxw - width (head q) + 1"$ is always greater than zero. \begin{description} \item[Case \texttt{q} is a singleton:] so \verb"cg q n" is zero for any \texttt{n}. Thus, the only value of \texttt{n} for which $\verb"cg p n" \le \verb"cg q n"$ would be one for which \verb"cg p n" is zero; this can only happen when $\verb"n" = \verb"optw - width (head p)"$ and $\verb"cost (tail p)" = \verb"0"$. This case therefore splits into two subcases: if \verb"cost (tail p)" is zero, then \verb"bf p q" is the smaller of \texttt{rqh} and \verb"optw - width (head p)"; otherwise, there are no suitable values of \texttt{n}, and \verb"bf p q" is simply \texttt{rqh}. \item[Case \texttt{q} is not a singleton:] Note that, for non-singleton \texttt{p}, \begin{eqnarray*} \verb"cg p n" &=& \verb"cost p + n^2 - 2*n*(optw - width (head p))" \end{eqnarray*} and so $\verb"cg p n" \le \verb"cg q n"$ precisely when \begin{eqnarray*} \texttt{n} &\ge& \verb"(cost p-cost q)/(2*(width (head q)-width (head p)))" \end{eqnarray*} (the divisor is non-zero, because of the ordering on lengths of first lines). Therefore, the first term in \verb"bf p q" is the ceiling of the fraction on the right-hand side of this inequation, and \verb"bf p q" itself is the smaller of this first term and \texttt{rqh}. \end{description} Thus, we can implement \texttt{bf} as follows: \begin{verbatim} bf p q | single q && cost pt == 0 = (optw - wph) `min` rqh | single q = rqh | otherwise = ceildiv (cost p - cost q) (2*(wqh - wph)) `min` rqh where ph:pt = p qh:qt = q wph = width ph wqh = width qh rqh = maxw - wqh + 1 \end{verbatim} where \verb"ceildiv x y" rounds the fraction \verb"x/y" up to the next integer. \section{Filtering} \label{sec:filtering} Because the list of candidate paragraphs is kept in increasing order of length of the first line, the \texttt{filter} is easily dealt with. The net effect of filtering is that the last few formats (namely, those with the longest first lines) are discarded, and the remainder are retained. Therefore, instead of \verb"filter fitH" we can use \verb"droptail (not . fitH)", where \begin{verbatim} droptail :: (a->Bool) -> [a] -> [a] droptail p [] = [] droptail p (xs++[x]) | p x = droptail p xs | otherwise = xs ++ [x] \end{verbatim} Informally, \verb"droptail p x" discards elements from the end of the list~\verb"x", stopping when the list is empty or the last element does not satisfy predicate~\verb"p". \iffalse \begin{mcode} >par2'' > = last . fold1 step start > where > step w ps = trim(droptail (not.fitH) (new w (last ps) `add` map (glue w) ps)) > start w = droptail (not.fitH) [ [[w]] ] > droptail p [] = [] > droptail p xsx | p x = droptail p xs > | otherwise = xsx > where x = last xsx > xs = init xsx > add p [] = [p] > add p [q] = [p,q] > add p (q:r:rs) | bf p q <= bf q r = add p (r:rs) > | otherwise = p:q:r:rs > bf p q > | single q && cost pt == 0 > = (optw - wph) `min` rqh > | single q = rqh > | otherwise = ceildiv (cost p - cost q) (2*(wqh-wph)) `min` rqh > where ph:pt = p > qh:qt = q > wph = width ph > wqh = width qh > rqh = maxw - wqh + 1 > ceildiv n m = (n+m-1) `div` m > trim [] = [] > trim [p] = [p] > trim pspq | cost p <= cost q = trim psp > | otherwise = pspq > where q = last pspq > psp = init pspq > p = last psp > ps = init psp \end{mcode} \fi \section{Differencing} \label{sec:differencing} In this section we will get rid of the \verb"map" from the definition of \verb"step", by making a change of representation under which \verb"glue w" is the identity function. If we assume that the list operations \verb"head", \verb"tail", \verb"init" and \verb"last" take amortized constant time, then this gives an amortized linear-time algorithm for paragraph formatting. Every word of the paragraph contributes at most one new candidate solution, and the amount of work performed (by \verb"add" and \verb"trimf") on the list of candidate solutions is proportional to the number of candidate solutions discarded. \subsection{A data refinement} Elimination of \texttt{glue} can be achieved by computing only the tail of each paragraph. As long as we have the original text available (which is the concatenation of the paragraph), all necessary quantities can be computed in terms of the tail alone: \begin{verbatim} length (head p) = length (concat p) - length (concat (tail p)) width (head p) | single p = width (concat p) | otherwise = width (concat p) - width (concat (tail p)) - 1 cost p | single p = 0 | otherwise = (optw - width (head p))^2 + cost (tail p) \end{verbatim} (Recall that we stipulated that \texttt{cost [] = 0}.) The exploitation of this type of equation is known as \emph{differencing}. We shall represent a paragraph \texttt{p} by the triple \verb"rep p" where \begin{verbatim} rep p = ((width.concat.tail) p, (cost.tail) p, (length.concat.tail) p) \end{verbatim} It will be useful to have a type synonym for the new representation of paragraphs: \begin{mcode} >type Par = (Width,Cost,Length) >type Width = Int >type Cost = Int >type Length = Int >width_tl = fst3 >cost_tl = snd3 >len_tl = thd3 \end{mcode} Here, the functions \texttt{fst3}, \texttt{snd3} and \texttt{thd3} return the first, second and third components of a triple, respectively. \begin{mcode} >fst3 (a,b,c) = a >snd3 (a,b,c) = b >thd3 (a,b,c) = c \end{mcode} On this representation, the function \texttt{glue w} is the identity function, as required. \subsection{The overall structure} Before we go into the details of the implementation of other operators on paragraphs, we outline the structure of the final program. The program presented below is based on the fact that a solution to \texttt{par0} is returned by \texttt{par3}, where \begin{verbatim} par3 :: Txt -> Paragraph par3 ws = tile ws (map (length.concat.tail.par0) (tails ws), length ws) \end{verbatim} The function \texttt{tile xs} produces the required solution by exploiting the differencing equation for \verb"length . head": \begin{mcode} >tile :: Txt -> ([Length],Length) -> Paragraph >tile ws ([],n) = [] >tile ws (m:ms,n) = ws1 : tile ws2 (drop l (m:ms),m) > where l = n - m > (ws1,ws2) = splitAt l ws \end{mcode} (Here, \verb"splitAt l x" is a pair of lists, the first element of the pair being the first \texttt{l} elements of \texttt{x} and the second element being the remainder; \verb"drop l x" is the second component of \verb"splitAt l x".) The proof that this works is an induction over all tails of the argument, and a detailed exposition can be found in \cite{bird93d}. It is perhaps interesting to note that a program involving \texttt{tile} is the starting point for the paper by Hirschberg and Larmore \cite{hirschberg87a}; for us, it is part of a final optimisation. Adapting the algorithm developed in previous sections to the new representation of paragraphs, one can find functions \texttt{stepr} and \texttt{startr} ---~data refinements of \texttt{step} and \texttt{start}~--- such that \begin{verbatim} fold1 stepr startr (map length ws) = (map rep (fold1 step start ws), width ws, length ws) \end{verbatim} and so, by the scan lemma of the introductory section, which showed how the computation of \verb|fold1 f g| on all tails can be written in terms of \verb|scan1|, \begin{verbatim} scan1 stepr startr (map length ws) = zip3 (map (map rep . fold1 step start) (tails ws), map width (tails ws), map length (tails ws)) \end{verbatim} (The function \texttt{zip3} `zips' in the obvious way a triple of lists, all of the same length, into a list of triples.) Let \verb"zs = scan1 stepr startr (map length ws)". Then \begin{verbatim} length ws = thd3 (head zs) \end{verbatim} and \begin{verbatim} map (length . concat . tail . par0) (tails ws) = { rep } map (len_tl . rep . par0) (tails ws) = { par0 = last . fold1 step start } map (len_tl . last . map rep . fold1 step start) (tails ws) = { above } map (len_tl . last . fst3) zs \end{verbatim} The resulting program is below. (Recall that the dashed list operations are the operations on symmetric lists, defined in the appendix.) \begin{mcode} >par3 :: Txt -> Paragraph >par3 ws > = tile ws (map (len_tl.last'.fst3) zs, thd3 (head zs)) > where zs = scan1 stepr startr (map length ws) \end{mcode} \subsection{Implementing the data refinement} It remains to give appropriate definitions of \texttt{stepr} and \texttt{startr}. The definition of \texttt{startr} is \begin{mcode} >startr :: Length -> (SymList Par, Width, Length) >startr a | a <= maxw = (cons' (0,0,0) nil',a,1) \end{mcode} The definition of \texttt{stepr} mirrors that in the preceding section, except that all operations on paragraphs have been data-refined to the new representation of paragraphs. Those modifications are justified by the differencing equations stated above, and the following definitions are immediate consequences of those identities: \begin{mcode} >stepr :: Length -> > (SymList Par, Cost, Length) -> > (SymList Par, Cost, Length) >stepr w (ps,tw,tl) > = (trim (drop_nofit (new (last' ps) `add` ps)), tot_width, tot_len) > where > single p = len_tl p == 0 > cost p > | single p = 0 > | otherwise = cost_tl p + (optw - width_hd p)^2 > width_hd p > | single p = tot_width > | otherwise = tot_width - width_tl p - 1 > tot_width = w + 1 + tw > tot_len = 1 + tl \end{mcode} The operator \texttt{new} adds a new line to the front of a paragraph. It is important that, in computing the cost of the tail of the newly created paragraph, we use the old width of the head, that is, without taking the new word \texttt{w} into account: \begin{mcode} > new p | single p = (tw,0,tl) > | otherwise = (tw,cost_tl p + (optw-old_width_hd p)^2,tl) > old_width_hd p | single p = tw > | otherwise = tw - width_tl p - 1 \end{mcode} The definition of \texttt{trim} is not changed at all: \begin{mcode} > trim ps_pq | null' ps_pq = ps_pq > | single' ps_pq = ps_pq > | cost p <= cost q = trim ps_p > | otherwise = ps_pq > where ps_p = init' ps_pq > q = last' ps_pq > p = last' ps_p \end{mcode} whereas \verb"drop_nofit" is an implementation of \verb"droptail (not . fitH)", using the new implementation \verb"width_hd" of \verb"width . head". \begin{mcode} > drop_nofit ps_p | null' ps_p = ps_p > | width_hd p > maxw = drop_nofit ps > | otherwise = ps_p > where ps = init' ps_p > p = last' ps_p \end{mcode} The definition of \texttt{add} is similarly unaffected. On an intuitive level, there seems to be a duality between the ways \texttt{trimf} and \texttt{add} operate, but we have been unable to bring this out in the code, partly because \texttt{trimf} also performs the filtering operation. \marginnote{also because \texttt{add} compares \texttt{f p} with \texttt{f q}, whereas \texttt{trim} compares \texttt{f p q} with \texttt{f q r}} \begin{mcode} > add p qr_rs | single' qr_rs || null' qr_rs = cons' p qr_rs > | bf p q <= bf q r = add p r_rs > | otherwise = cons' p qr_rs > where r_rs = tail' qr_rs > q = head' qr_rs > r = head' r_rs \end{mcode} Finally, the data-refined version of \texttt{bf} becomes \begin{mcode} > bf p q > | single q && cost_tl p == 0 = (optw - wph) `min` rqh > | single q = rqh > | otherwise = ceildiv (cost p-cost q) > (2*(wqh-wph)) `min` rqh > where > wph = width_hd p > wqh = width_hd q > rqh = maxw - wqh + 1 >ceildiv n m = (n+m-1) `div` m \end{mcode} It is not hard to check that program \texttt{par3} does indeed have (amortised) linear time complexity. This theoretical bound is confirmed in computational experiments, and for all but the smallest inputs, \texttt{par3} outperforms the standard algorithm \texttt{par1}. \section{Haskell vs C++} We now have the ingredients for writing a program that has the same functionality as the Unix utility \emph{fmt}, although its output will be far superior (\emph{fmt} uses a naive greedy strategy, and the resulting paragraphs are \emph{not} visually pleasing). We shall make use of the functions \begin{verbatim} parse :: String -> [Paragraph] unparse :: [Paragraph] -> String \end{verbatim} which are well-known text-processing primitives in functional programming \cite{bird88}. Their definitions are included in an appendix to this paper. Using these primitives, our implementation of \texttt{fmt} takes a single line: \begin{mcode} >fmt = unparse . map (par3 . concat) . parse \end{mcode} \iffalse \begin{mcode} >fmtWith par = unparse . map (par . concat) . parse \end{mcode} \fi Joe Programmer may not be happy about this implementation of a high-quality \texttt{fmt}. Although there is no algorithm gap, one might expect a \emph{performance gap} between the Haskell program and an implementation of the same algorithm in a more conventional language. To measure the performance gap we compared the Haskell program for \texttt{fmt} to a hand-coded C++ implementation that is in close correspondence to the program presented here. The conventional program in C++ does make extensive use of destructive updates, however, and the implementation of symmetric lists is replaced by an array implementation. Because the program only makes use of {\em cons} (and not of {\em snoc}), we can implement it by declaring an array whose size is an upperbound on the number of words in a paragraph, with two pointers that indicate the beginning and end of the symmetric list. (If we also had a {\em snoc} operation, we would need to use the folklore circular array code for queues.) All data structures in the conventional program are therefore of fixed size. Appropriate size bounds were determined by experimentation. The conventional program is of course longer than the Haskell program, but this is mostly due to the unwieldy syntax, as the difference is only a factor of one third. Personally we found the conventional code much harder to write because it uses a lot of indexing in arrays, as opposed to the standard list processing functions in Haskell. In writing the C++ code, we attempted to apply all the standard tricks that good C++ programmers employ to speed up their programs. For example, index calculations were avoided through use of pointers, and we provided ample hints to the compiler through {\em const} declarations and {\em inline} directives. To check that we did indeed conform to good practice in writing the C++ program, we compared its performance to that of the \verb|fmt| utility: our code for the sophisticated algorithm is only 18% slower than \verb|fmt|. By contrast, the Haskell program in this paper has {\em not} been fine-tuned for performance at all, and we directly compiled the \LaTeX source of this paper. It follows that the performance measurements reported give an edge to C++. All three programs were compiled on a Pentium II processor, running RedHat Linux. For Haskell we used version 4.01 of the Glasgow compiler \texttt{ghc}, because it produces the best code of all Haskell compilers available. The same code was also compiled with \texttt{hbc}, which also has a good reputation for speed and reliability. For C++ we used the Gnu compiler. All three executables were reduced in size using the utility \texttt{strip}. The Haskell executables are, as expected, vastly larger than the C++ code ---~they differ by about a factor of 25. In all cases we switched on all optimizers. This has a spectacular effect for the \texttt{ghc} program: it ran more than four times faster than without the optimisation switch. Indeed, this is were we claw back some of the gains obtained by hand-coded optimisations in the C++ code: the \texttt{ghc} compiler aggressively applies optimising program transformations \cite{peyton-jones98}. To compare the performance of the two executables, we formatted the full text of Thomas Hardy's \emph{Far from the madding crowd}, an ASCII file of approximately 780Kb \cite{gutenberg}. The three programs were run to format this file for a maximum line width of 70 characters and an optimum width of 63. The CPU time was measured using the \texttt{time} command provided by the Linux \texttt{bash} shell. The \texttt{ghc} executable is about twice as fast as the \texttt{hbc} program, which is shows how much can be achieved by automatic transformation of Haskell programs. The C++ program is eleven times faster again, which reflects the effort put into the respective compilers, and the fact that we did not bother to fine-tune the Haskell code. The table below summarises the above comparison: the first two columns compare the programs with respect to their textual length (lines and characters), the third column is the size of the executable (in Kbytes), and the last column shows their execution time (in CPU seconds). \begin{center} \begin{tabular}{l|rrrrr} & lines & chars & size (Kb) & time (s) \\\hline Haskell (hbc) & 183 & 4676 & 196~~~ & 10.34~~~ \\ Haskell (ghc) & 183 & 4676 & 453~~~ & 4.72~~~ \\ C++ & 416 & 6310 & 8~~~ & 0.43~~~~ \\\hline Haskell (hbc)/(ghc) & 1.00 & 1.00 & 2.31~~~ & 2.19~~~ \\ Haskell (ghc)/C++ & 0.44 & 0.74 & 24.50~~~ & 11.27~~~ \end{tabular} \end{center} In summary, the performance gap is not all that great; it furthermore seems likely that advances in compiler technology (illustrated by the difference between \texttt{hbc} and \texttt{ghc}) will cancel the remaining advantages of languages like C++ over Haskell in the next few years. \section{Discussion} This paper was an experiment in using a functional language for presenting a non-trivial algorithm in a semi-formal style. We personally believe that for a large class of problems, this style of presentation is adequate, at once closing the algorithm gap and reconciling algorithm design with formal methods. The comparison with the hand-coded conventional implementations indicates that for non-trivial algorithms like the one presented here, the performance gap is rather small too. There are, however, two unsatisfactory aspects of the material presented here: \begin{itemize} \item First, we are not entirely satisfied with the semi-formal style of this paper. Up to the introduction of \texttt{trim}, the program derivation is absolutely standard, and no invention is involved in synthesizing the program. That part of the paper could easily be cast in calculational form, given the right machinery. The invention of the `bump factor', and its role in `forecasting the future', is however rather \emph{ad hoc}, and escapes, at present, an elegant calculational treatment. This is unsatisfactory, especially since the technique seems more generally applicable. \item Second, we are very dissatisfied with the way one has to program differencing in a functional language. In a sense this is the least interesting part of the programming process, and yet it is quite error-prone. Moreover, differencing destroys some of the delightful elegance that characterises the functional expression of the standard algorithm. Meta-programming features in the spirit of Paige's \texttt{invariant} construct \cite{paige86} such as those espoused by Smith~\cite{smith90} and Liu~\cite{liu95} might be used to circumvent this problem, but unfortunately we do not know of any modern functional language that supports those ideas. \end{itemize} Finally, the algorithm presented here is representative of a large class of ingenious algorithms, collectively known under the name \emph{sparse dynamic programming} \cite{eppstein92b}. It would be nice to see whether a generic treatment of this class of algorithms is possible, in the style of \cite{demoor95}. It seems that such a generic approach is within reach, but we have not investigated this in any depth. \iffalse \bibliographystyle{plain} \bibliography{new} \else \begin{thebibliography}{10} \raggedright \bibitem{bird86} R.~S. Bird. \newblock Transformational programming and the paragraph problem. \newblock {\em Science of Computer Programming}, 6(2):159--189, 1986. \bibitem{bird90} R.~S. Bird. \newblock A calculus of functions for program derivation. \newblock In D.~A. Turner, editor, {\em Research Topics in Functional Programming}, University of Texas at Austin Year of Programming Series, pages 287--308. Addison--Wesley, 1990. \bibitem{bird93d} R.~S. Bird and O.~De~Moor. \newblock List partitions. \newblock {\em Formal Aspects of Computing}, 5(1):61--78, 1993. \bibitem{bdm96} R.~S. Bird and O.~De~Moor. \newblock \emph{Algebra of Programming}. \newblock International Series in Computer Science. Prentice--Hall, 1996. \bibitem{bird88} R.~S. Bird and P.~Wadler. \newblock {\em Introduction to Functional Programming}. \newblock International Series in Computer Science. Prentice--Hall, 1988. \bibitem{chin90} W.~N. Chin. \newblock {\em Automatic Methods for Program Transformation}. \newblock Ph.\,D. thesis, Imperial College, London, 1990. \bibitem{demoor95} O.~De~Moor. \newblock A generic program for sequential decision processes. \newblock In M.~Hermenegildo and D.~S. Swierstra, editors, {\em Programming Languages: Implementations, Logics, and Programs}, volume 982 of {\em Lecture Notes in Computer Science}. Springer--Verlag, 1995. \bibitem{eppstein92b} D.~Eppstein, Z.~Galil, R.~Giancarlo, and G.~F. Italiano. \newblock Sparse dynamic programming \uppercase{II}: Convex and concave cost functions. \newblock {\em Journal of the ACM}, 39(3):546--567, 1992. \bibitem{galil89} Z.~Galil and R.~Giancarlo. \newblock Speeding up dynamic programming with applications to molecular biology. \newblock {\em Theoretical Computer Science}, 64:107--118, 1989. \bibitem{hirschberg87a} D.~S. Hirschberg and L.~L. Larmore. \newblock The least weight subsequence problem. \newblock {\em SIAM Journal on Computing}, 16(4):628--638, 1987. \bibitem{hirschberg87b} D.~S. Hirschberg and L.~L. Larmore. \newblock New applications of failure functions. \newblock {\em Journal of the Association for Computing Machinery}, 34(3):616--625, 1987. \bibitem{hoogerwoord92} R.~R. Hoogerwoord. \newblock A symmetric set of efficient list operations. \newblock {\em Journal of Functional Programming}, 2(4):505--513, 1992. \bibitem{knuth81} D.~E. Knuth and M.~F. Plass. \newblock Breaking paragraphs into lines. \newblock {\em Software: Practice and Experience}, 11:1119--1184, 1981. \bibitem{liu95} Y.~A.~Liu and T.~Teitelbaum. \newblock Systematic derivation of incremental programs. \newblock {\em Science of Computer Programming}, 24(1):1--39, 1995. \bibitem{morgan94} C.~C. Morgan. \newblock {\em Programming from Specifications}. \newblock International Series in Computer Science. 2nd edition, Prentice--Hall, 1994. \bibitem{paige86} R.~Paige. \newblock Programming with invariants. \newblock {\em IEEE Software}, 3(1):56--69, 1986. \bibitem{pettorossi84} A.~Pettorossi. \newblock {\em Methodologies for Transformations and Memoing in Applicative Languages}. \newblock Ph.\,D.\ thesis, Department of Computer Science, Edinburgh, 1984. \bibitem{smith90} D.~R.~Smith. \newblock KIDS: A Semi-Automatic Program Development System. \newblock {\em IEEE Transactions on Software Engineering}, 16(9):1024--1043, 1990. \bibitem{gutenberg} Gutenberg Project. \newblock Available by ftp from: {\tt mrcnext.cso.uiuc.edu}. \newblock Many original texts as ASCII files, 1994. \end{thebibliography} \fi \section*{Appendix: Symmetric lists} The implementation of symmetric lists given below is explained in some depth in \cite{hoogerwoord92}. Briefly, a list \texttt{x} is represented as a pair of lists \texttt{(y,z)} such that \verb"abs (y,z)"~=~\verb"x", where the \emph{abstraction function} \texttt{abs} is defined by \begin{verbatim} abs (y,z) = y ++ reverse z \end{verbatim} Moreover, the following invariant is maintained: if either of the two lists is empty, the other is empty or a singleton. The operations below implement their non-symmetric counterparts in the sense that \begin{verbatim} head' = head . abs abs . tail' = tail . abs \end{verbatim} and so on. The implementation is such that each operation takes amortised constant time. \begin{mcode} >type SymList a = ([a],[a]) >single' (x,y) = (null x && single y) || (single x && null y) >null' ([],[]) = True >null' _ = False >nil' = ([],[]) >head' (x,y) | not (null x) = head x > | otherwise = head y >last' (y,x) | not (null x) = head x > | otherwise = head y >cons' a (x,y) | not (null y) = (a:x,y) > | otherwise = ([a],x) >snoc' a (y,x) | not (null y) = (y,a:x) > | otherwise = (x,[a]) >tail' (x,y) | null x = ([],[]) > | single x = (reverse y1, y0) > | otherwise = (tail x, y) > where (y0,y1) = splitAt (length y `div` 2) y >init' (y,x) | null x = ([],[]) > | single x = (y0, reverse y1) > | otherwise = (y, tail x) > where (y0,y1) = splitAt (length y `div` 2) y \end{mcode} \section*{Appendix: Text processing} The text processing package given below is explained in \cite{bird88}. It provides primitives for converting between strings and lines, lines and words, and paragraphs and lines. In each case, the forward direction can be programmed using the generic solution \texttt{format}, and the backward conversion using \texttt{unformat}. The definitions of \texttt{unlines}, \texttt{lines}, \texttt{unwords} and \texttt{words} have been commented out because they are already defined in the standard Haskell prelude. The function \texttt{id} is the identity function. \begin{mcode} >unformat :: a -> [[a]] -> [a] >unformat a = fold1 insert id > where insert xs ys = xs ++ [a] ++ ys >format :: Eq a => a -> [a] -> [[a]] >format a [] = [[]] >format a x = fold1 (break a) (start a) x > where break a b xs | a == b = []:xs > | otherwise = (b:head xs):tail xs > start a b = break a b [[]] *unlines = unformat '\n' *lines = format '\n' *unwords = unformat ' ' *words = filter (/=[]) . format ' ' >unparas :: [[[String]]] -> [[String]] >unparas = unformat [] >paras :: [[String]] -> [[[String]]] >paras = filter (/=[]) . format [] >parse = paras . map words . lines >unparse = unlines . map unwords . unparas \end{mcode} \end{document} The simple greedy algorithm: >parg :: Txt -> Paragraph >parg = foldl nextword [[]] > where > nextword p w | fits (last p++[w]) = init p ++ [last p ++ [w]] > | otherwise = p ++ [[w]] >fmtg = fmtWith parg For comparison, the quadratic algorithm: >fmt1 = fmtWith par1 Some test data: >test = > "In the constructive programming community it is commonplace to see " ++ > "formal developments of textbook algorithms. In the algorithm design " ++ > "community, on the other hand, it may be well known that the textbook " ++ > "solution to a problem is not the most efficient possible. However, in " ++ > "presenting the more efficient solution, the algorithm designer will " ++ > "usually omit some of the implementation details, this creating an " ++ > "algorithm gap between the abstract algorithm and its concrete " ++ > "implementation. This is in contrast to the formal development, which " ++ > "usually presents the complete concrete implementation of the less " ++ > "efficient solution. \n \n" >tests = concat (repeat test) >main = getArgs >>= (\as -> > if length as /= 1 > then putStr "usage: para " > else openFile (head as) ReadMode >>= (\h -> > hGetContents h >>= (\ws -> > putStr (if null ws then [] else (fmt ws)))))