% Copyright (C) 1991, 1995, 1996, 1998, 1999 Aladdin Enterprises. All rights reserved. % % This software is provided AS-IS with no warranty, either express or % implied. % % This software is distributed under license and may not be copied, % modified or distributed except as expressly authorized under the terms % of the license contained in the file LICENSE in this distribution. % % For more information about licensing, please refer to % http://www.ghostscript.com/licensing/. For information on % commercial licensing, go to http://www.artifex.com/licensing/ or % contact Artifex Software, Inc., 101 Lucas Valley Road #110, % San Rafael, CA 94903, U.S.A., +1(415)492-9861. % $Id: ps2ascii.ps,v 1.10 2004/06/23 09:04:17 igor Exp $ % Extract the ASCII text from a PostScript file. Nothing is displayed. % Instead, ASCII information is written to stdout. The idea is similar to % Glenn Reid's `distillery', only a lot more simple-minded, and less robust. % If SIMPLE is defined, just the text is written, with a guess at line % breaks and word spacing. If SIMPLE is not defined, lines are written % to stdout as follows: % % F () % Indicate the font height and the width of a space. % % P % Indicate the end of the page. % % S () % Display a string. % % and are integer dimensions in units of 1/720". % and are integer coordinates, in units of 1/720", with the origin % at the lower left. % and are strings represented with the standard % PostScript escape conventions. % If COMPLEX is defined, the following additional types of lines are % written to stdout. % % C % Indicate the current color. % % I % Note the presence of an image. % % R % Fill a rectangle. % % , , and are RGB values expressed as integers between 0 and 1000. % % Note that future versions of this program (in COMPLEX mode) may add % other output elements, so programs parsing the output should be % prepared to ignore elements that they do not recognize. % Note that this code will only work in all cases if systemdict is writable % and if `binding' the definitions of operators defined as procedures % is deferred. For this reason, it is normally invoked with % gs -q -dNODISPLAY -dDELAYBIND -dWRITESYSTEMDICT ps2ascii.ps % Thanks to: % J Greely for improvements to this code; % Jerry Whelan for motivating other improvements; % David M. Jones for improvements noted below. %% Additional modifications by David M. Jones %% (dmjones@theory.lcs.mit.edu), December 23, 1997 %% %% (a) Rewrote forall loop at the end of .show.write. This fixes a %% stack leakage problem, but the changes are more significant %% than that. %% %% .char.map includes the names of all characters in the %% StandardEncoding, ISOLatin1Encoding, OT1Encoding and %% T1Encoding vectors. Thus, if the Encoding vector for the %% current font contains a name that is not in .char.map, it's %% redundant to check if the Encoding vector is equal to one of %% the known vectors. Previous versions of ps2ascii would give %% up at this point, and substitute an asterisk (*) for the %% character. I've taken the liberty of instead using the %% OT1Encoding vector to translate the character, on the grounds %% that in the cases I'm most interested in, a font without a %% useful Encoding vector was most likely created by a DVI to PS %% converter such as dvips or DVILASER (and OT1Encoding is %% largely compatible with StandardEncoding anyway). [Note that %% this does not make my earlier changes to support dvips (see %% fix (a) under my 1996 changes) completely obsolete, since %% there's additional useful information I can extract in that %% case.] %% %% Overall, this should provide better support for some documents %% (e.g, DVILASER documents will no longer be translated into a %% series of *'s) without breaking any other documents any worse %% than they already were broken. %% %% (b) Fixed two bugs in dvips.df-tail: (1) changed "dup 127" to "dup %% 128" to fix fencepost error, and (2) gave each font it's own %% FontName rather than having all fonts share the same name. %% %% (c) Added one further refinement to the heuristic for detecting %% paragraph breaks: do not ever start a new paragraph after a %% line ending in a hyphen. %% %% (d) Added a bunch of missing letters from the T1Encoding, %% OT1Encoding and ISOLatin1Encoding vectors to .letter.chars to %% improve hyphen-elimination algorithm. This still won't help %% if there's no useful Encoding vector. %% %% NOTE: A better solution to the problem of missing Encoding vectors %% might be to redefine definefont to check whether the Encoding %% vector is sensible and, if not, replace it by a default. This %% would alleviate the need for constant tests in the .show.write %% loop, as well as automatically solving the problem noted in fix %% (d) above, and the similar problem with .break.chars. This should %% be investigated. Also, the hyphen-elimination algorithm really %% needs to be looked at carefully and rethought. %%* Modifications to ps2ascii.ps by David M. Jones %%* (dmjones@theory.lcs.mit.edu), June 25-July 8, 1996 %%* Modifications: %%* %%* (a) added code to give better support for dvips files by providing %%* FontBBox's, FontName's and Encoding vectors for downloaded %%* bitmap fonts. This is done by using dvips's start-hook to %%* overwrite the df-tail and D procedures that dvips uses to %%* define its Type 3 bitmap fonts. Thus, this change should %%* provide better support for dvips-generated PS files without %%* affecting the handling of other documents. %%* %%* (b) Fixed two bugs that could potentially affect any PS file, not %%* just those created by dvips: (1) added missing "get" operator %%* in .show.write and (2) fixed bug that caused a hyphen at the %%* end of a line to be replaced by a space rather than begin %%* deleted. Note that the first bug was a source of stack %%* leakage, causing ps2ascii to run out of operand stack space %%* occasionally. %%* %%* Search for "%%* BF" to find these modifications. %%* %%* (c) Improved the heuristic for determining whether a line break %%* has occurred and whether a line break represents a paragraph %%* break. Previously, any change in the vertical position caused %%* a line break; now a line break is only registered if the %%* change is larger than the height of the current font. This %%* means that superscripts, subscripts, and such things as %%* shifted accents generated by TeX won't cause line breaks. %%* Paragraph-recognition is now done by comparing the indentation %%* of the new line to the indentation of the previous line and by %%* comparing the vertical distance between the new line and the %%* previous line to the vertical distance between the previous %%* line and its predecessor. %%* %%* (d) Added a hook for renaming the files where stdout and stderr %%* go. %%* %%* In general, my additions or changes to the code are described in %%* comments beginning with "%%*". However, there are numerous other %%* places where I have either re-formatted code or added comments to %%* the code while I was trying to understand it. These are usually %%* not specially marked. %%* /QUIET true def systemdict wcheck { systemdict } { userdict } ifelse begin /.max where { pop } { /.max { 2 copy lt { exch } if pop } bind def } ifelse /COMPLEX dup where { pop true } { false } ifelse def /SIMPLE dup where { pop true } { false } ifelse def /setglobal where { pop currentglobal /setglobal load true setglobal } { { } } ifelse % Define a way to store and retrieve integers that survives save/restore. /.i.string0 (0 ) def /.i.string .i.string0 length string def /.iget { cvi } bind def /.iput { exch //.i.string exch copy cvs pop } bind def /.inew { //.i.string0 dup length string copy } bind def % We only want to redefine operators if they are defined already. /codef { 1 index where { pop def } { pop pop } ifelse } def % Redefine the end-of-page operators. /erasepage { } codef /copypage { SIMPLE { (\014) } { (P\n) } ifelse //print } codef /showpage { copypage erasepage initgraphics } codef % Redefine the fill operators to detect rectangles. /.orderrect % .orderrect { % Ensure llx <= urx, lly <= ury. 1 index 4 index lt { 4 2 roll } if dup 3 index lt { 3 1 roll exch } if exch 3 index sub exch 2 index sub } odef /.fillcomplex { % Do a first pass to see if the path is all rectangles in % the output coordinate system. We don't worry about overlapping % rectangles that might be partially not filled. % Stack: mark llx0 lly0 urx0 ury0 ... true mark x0 y0 ... mark true mark % Add a final moveto so we pick up any trailing unclosed subpath. 0 0 itransform moveto { .coord counttomark 2 gt { counttomark 4 gt { .fillcheckrect } { 4 2 roll pop pop } ifelse } if } { .coord } { cleartomark not mark exit } { counttomark -2 roll 2 copy counttomark 2 roll .fillcheckrect } pathforall cleartomark { .showcolor counttomark 4 idiv { counttomark -4 roll .orderrect (R ) //print .show==4 } repeat pop } { cleartomark } ifelse } odef /.fillcheckrect { % Check whether the current subpath is a rectangle. % If it is, add it to the list of rectangles being accumulated; % if not exit the .fillcomplex loop. % The subpath has not been closed. % Stack: as in .fillcomplex, + newx newy counttomark 10 eq { 9 index 9 index 4 2 roll } if counttomark 12 ne { cleartomark not mark exit } if 12 2 roll % Check for the two possible forms of rectangles: % x0 y0 x0 y1 x1 y1 x1 y0 x0 y0 % x0 y0 x1 y0 x1 y1 x0 y1 x0 y0 9 index 2 index eq 9 index 2 index eq and 10 index 9 index eq { % Check for first form. 7 index 6 index eq and 6 index 5 index eq and 3 index 2 index eq and } { % Check for second form. 9 index 8 index eq and 8 index 7 index eq and 5 index 4 index eq and 4 index 3 index eq and } ifelse not { cleartomark not mark exit } if % We have a rectangle. pop pop pop pop 4 2 roll pop pop 8 4 roll } odef /eofill { COMPLEX { .fillcomplex } if newpath } codef /fill { COMPLEX { .fillcomplex } if newpath } codef /rectfill { gsave newpath .rectappend fill grestore } codef /ueofill { gsave newpath uappend eofill grestore } codef /ufill { gsave newpath uappend fill grestore } codef % Redefine the stroke operators to detect rectangles. /rectstroke { gsave newpath dup type dup /arraytype eq exch /packedarraytype eq or { dup length 6 eq { exch .rectappend concat } { .rectappend } ifelse } { .rectappend } ifelse stroke grestore } codef /.strokeline % .strokeline % Note: fromx and fromy are in output coordinates; % tox and toy are in user coordinates. { .coord 2 copy 6 2 roll .orderrect % Add in the line width. Assume square or round caps. currentlinewidth 2 div dup .dcoord add abs 1 .max 5 1 roll 4 index add 4 1 roll 4 index add 4 1 roll 4 index sub 4 1 roll 5 -1 roll sub 4 1 roll (R ) //print .show==4 } odef /.strokecomplex { % Do a first pass to see if the path is all horizontal and vertical % lines in the output coordinate system. % Stack: true mark origx origy curx cury true mark null null null null { .coord 6 2 roll pop pop pop pop 2 copy } { .coord 1 index 4 index eq 1 index 4 index eq or { 4 2 roll pop pop } { cleartomark not mark exit } ifelse } { cleartomark not mark exit } { counttomark -2 roll 2 copy counttomark 2 roll 1 index 4 index eq 1 index 4 index eq or { pop pop 2 copy } { cleartomark not mark exit } ifelse } pathforall cleartomark 0 currentlinewidth .dcoord 0 eq exch 0 eq or and % Do the second pass to write out the rectangles. % Stack: origx origy curx cury { .showcolor null null null null { 6 2 roll pop pop pop pop 2 copy .coord } { .strokeline } { } { 3 index 3 index .strokeline } pathforall pop pop pop pop } if } odef /stroke { COMPLEX { .strokecomplex } if newpath } codef /ustroke { gsave newpath dup length 6 eq { exch uappend concat } { uappend } ifelse stroke grestore } codef % The image operators must read the input and note the dimensions. % Eventually we should redefine these to detect 1-bit-high all-black images, % since this is how dvips does underlining (!). /.noteimagerect % .noteimagerect - { COMPLEX { gsave setmatrix itransform 0 0 itransform grestore .coord 4 2 roll .coord .orderrect (I ) //print .show==4 } { pop pop pop } ifelse } odef /colorimage where { pop /colorimage { 1 index { dup 6 add index 1 index 6 add index 2 index 5 add index } { 6 index 6 index 5 index } ifelse .noteimagerect gsave nulldevice //colorimage grestore } codef } if /.noteimage % Arguments as for image[mask] { dup type /dicttype eq { dup /Width get 1 index /Height get 2 index /ImageMatrix get } { 4 index 4 index 3 index } ifelse .noteimagerect } odef /image { .noteimage gsave nulldevice //image grestore } codef /imagemask { .noteimage gsave nulldevice //imagemask grestore } codef % Output the current color if necessary. /.color.r .inew def .color.r -1 .iput % make sure we write the color at the beginning /.color.g .inew def /.color.b .inew def /.showcolor { COMPLEX { currentrgbcolor 1000 mul round cvi 3 1 roll 1000 mul round cvi exch 1000 mul round cvi % Stack: b g r dup //.color.r .iget eq 2 index //.color.g .iget eq and 3 index //.color.b .iget eq and { pop pop pop } { (C ) //print dup //.color.r exch .iput .show==only ( ) //print dup //.color.g exch .iput .show==only ( ) //print dup //.color.b exch .iput .show==only (\n) //print } ifelse } if } bind def % Redefine `show'. % Set things up so our output will be in tenths of a point, with origin at % lower left. This isolates us from the peculiarities of individual devices. /.show.ident.matrix matrix def /.show.ident { % - .show.ident % //.show.ident.matrix defaultmatrix % % Assume the original transformation is well-behaved. % 0.1 0 2 index dtransform abs exch abs .max /.show.scale exch def % 0.1 dup 3 -1 roll scale gsave initmatrix % Assume the original transformation is well-behaved... 0.1 0 dtransform abs exch abs .max 0.1 dup scale .show.ident.matrix currentmatrix % ... but undo any rotation into landscape orientation. dup 0 get 0 eq { 1 get dup abs div 90 mul rotate .show.ident.matrix currentmatrix } if grestore } bind def /.coord { % .coord transform .show.ident exch pop itransform exch round cvi exch round cvi } odef /.dcoord { % .coord % Transforming distances is trickier, because % the coordinate system might be rotated. .show.ident pop 3 1 roll exch 0 dtransform dup mul exch dup mul add sqrt 2 index div round cvi exch 0 exch dtransform dup mul exch dup mul add sqrt 3 -1 roll div round cvi } odef % Remember the current X, Y, and height. /.show.x .inew def /.show.y .inew def /.show.height .inew def % Remember the last character of the previous string; if it was a % hyphen preceded by a letter, we didn't output the hyphen. /.show.last (\000) def % Remember the current font. /.font.name 130 string def /.font.name.length .inew def /.font.height .inew def /.font.width .inew def %%* Also remember indentation of current line and previous vertical %%* skip /.show.indent .inew def /.show.dy .inew def % We have to redirect stdout somehow.... /.show.stdout { (%stdout) (w) file } bind def % Make sure writing will work even if a program uses =string. /.show.string =string length string def /.show.=string =string length string def /.show==only { //=string //.show.=string copy pop dup type /stringtype eq { dup length //.show.string length le { dup rcheck { //.show.string copy } if } if } if .show.stdout exch write==only //.show.=string //=string copy pop } odef /.show==4 { 4 -1 roll .show==only ( ) //print 3 -1 roll .show==only ( ) //print exch .show==only ( ) //print .show==only (\n) //print } odef /.showwidth % Same as stringwidth, but disable COMPLEX so that % we don't try to detect rectangles during BuildChar. { COMPLEX { /COMPLEX false def stringwidth /COMPLEX true def } { stringwidth } ifelse } odef /.showfont % .showfont { gsave % Try getting the height and width of the font from the FontBBox. currentfont /FontBBox .knownget not { {0 0 0 0} } if aload pop % llx lly urx ury exch 4 -1 roll % lly ury urx llx sub % lly ury dx 3 1 roll exch % dx ury lly sub % dx dy 2 copy .max 0 ne { currentfont /FontMatrix get dtransform } { pop pop % Fonts produced by dvips, among other applications, have % BuildChar procedures that bomb out when given unexpected % characters, and there is no way to determine whether a given % character will do this. So for Type 1 fonts, we measure a % typical character ('X'); for others, we punt. currentfont /FontType get 1 eq { (X) .showwidth pop dup 1.3 mul } { % No safe way to get the character size. Punt. 0 0 } ifelse } ifelse .dcoord exch currentfont /FontName .knownget not { () } if dup type /stringtype ne { //.show.string cvs } if grestore % Stack: height width fontname SIMPLE { pop pop //.show.height exch .iput } { 2 index //.font.height .iget eq 2 index //.font.width .iget eq and 1 index //.font.name 0 //.font.name.length .iget getinterval eq and { pop pop pop } { (F ) //print 3 -1 roll dup //.font.height exch .iput .show==only ( ) //print exch dup //.font.width exch .iput .show==only ( ) //print dup length //.font.name.length exch .iput //.font.name cvs .show==only (\n) //print } ifelse } ifelse } odef % Define the letters -- characters which, if they occur followed by a hyphen % at the end of a line, cause the hyphen and line break to be ignored. /.letter.chars 100 dict def mark 65 1 90 { dup 32 add } for counttomark { StandardEncoding exch get .letter.chars exch dup put } repeat pop %%* Add the rest of the letters from the [O]T1Encoding and %%* ISOLatin1Encoding vectors mark /AE /Aacute /Abreve /Acircumflex /Adieresis /Agrave /Aogonek /Aring /Atilde /Cacute /Ccaron /Ccedilla /Dcaron /Eacute /Ecaron /Ecircumflex /Edieresis /Egrave /Eng /Eogonek /Eth /Gbreve /Germandbls /IJ /Iacute /Icircumflex /Idieresis /Idot /Igrave /Lacute /Lcaron /Lslash /Nacute /Ncaron /Ntilde /OE /Oacute /Ocircumflex /Odieresis /Ograve /Ohungarumlaut /Oslash /Otilde /Racute /Rcaron /Sacute /Scaron /Scedilla /Tcaron /Tcedilla /Thorn /Uacute /Ucircumflex /Udieresis /Ugrave /Uhungarumlaut /Uring /Yacute /Ydieresis /Zacute /Zcaron /Zdot /aacute /abreve /acircumflex /adieresis /ae /agrave /aogonek /aring /atilde /cacute /ccaron /ccedilla /dbar /dcaron /dotlessi /dotlessj /eacute /ecaron /ecircumflex /edieresis /egrave /eng /eogonek /eth /exclamdown /ff /ffi /ffl /fi /fl /gbreve /germandbls /iacute /icircumflex /idieresis /igrave /ij /lacute /lcaron /lslash /nacute /ncaron /ntilde /oacute /ocircumflex /odieresis /oe /ograve /ohungarumlaut /oslash /otilde /questiondown /racute /rcaron /sacute /scaron /scedilla /section /sterling /tcaron /tcedilla /thorn /uacute /ucircumflex /udieresis /ugrave /uhungarumlaut /uring /yacute /ydieresis /zacute /zcaron /zdot counttomark { .letter.chars exch dup put } repeat pop % Define a set of characters which, if they occur at the start of a line, % are taken as indicating a paragraph break. /.break.chars 50 dict def mark /bullet /dagger /daggerdbl /periodcentered /section counttomark { .break.chars exch dup put } repeat pop % Define character translation to ASCII. % We have to do this for the entire character set. /.char.map 500 dict def /.chars.def { counttomark 2 idiv { .char.map 3 1 roll put } repeat pop } def % Encode the printable ASCII characters. mark 32 1 126 { 1 string dup 0 4 -1 roll put dup 0 get StandardEncoding exch get exch } for .chars.def % Encode accents. mark /acute (') /caron (^) /cedilla (,) /circumflex (^) /dieresis (") /grave (`) /ring (*) /tilde (~) .chars.def % Encode the ISO accented characters. mark 192 1 255 { ISOLatin1Encoding exch get =string cvs dup 0 1 getinterval 1 index dup length 1 sub 1 exch getinterval .char.map 2 index known .char.map 2 index known and { .char.map 3 -1 roll get .char.map 3 -1 roll get concatstrings .char.map 3 1 roll put } { pop pop pop } ifelse } for .chars.def % Encode the remaining standard and ISO alphabetic characters. mark /AE (AE) /Eth (DH) /OE (OE) /Thorn (Th) /ae (ae) /eth (dh) /ffi (ffi) /ffl (ffl) /fi (fi) /fl (fl) /germandbls (ss) /oe (oe) /thorn (th) .chars.def % Encode the other standard and ISO characters. mark /brokenbar (|) /bullet (*) /copyright ((C)) /currency (#) /dagger (#) /daggerdbl (##) /degree (o) /divide (/) /dotaccent (.) /dotlessi (i) /ellipsis (...) /emdash (--) /endash (-) /exclamdown (!) /florin (f) /fraction (/) /guillemotleft (<<) /guillemotright (>>) /guilsinglleft (<) /guilsinglright (>) /hungarumlaut ("") /logicalnot (~) /macron (_) /minus (-) /mu (u) /multiply (*) /ogonek (,) /onehalf (1/2) /onequarter (1/4) /onesuperior (1) /ordfeminine (-a) /ordmasculine (-o) /paragraph (||) /periodcentered (*) /perthousand (o/oo) /plusminus (+-) /questiondown (?) /quotedblbase (") /quotedblleft (") /quotedblright (") /quotesinglbase (,) /quotesingle (') /registered ((R)) /section ($) /sterling (#) /threequarters (3/4) /threesuperior (3) /trademark ((TM)) /twosuperior (2) /yen (Y) .chars.def % Encode a few common Symbol characters. mark /asteriskmath (*) /copyrightsans ((C)) /copyrightserif ((C)) /greaterequal (>=) /lessequal (<=) /registersans ((R)) /registerserif ((R)) /trademarksans ((TM)) /trademarkserif ((TM)) .chars.def %%* Add a few characters from StandardEncoding and ISOLatin1Encoding %%* that were missing. mark /cent (c) /guilsinglleft (<) /guilsinglright (>) /breve (*) /Lslash (L/) /lslash (l/) .chars.def %%* Define the OT1Encoding and T1Encoding vectors for use with dvips %%* files. Unfortunately, there's no way of telling what font is %%* really being used within a dvips document, so we can't provide an %%* appropriate encoding for each individual font. Instead, we'll %%* just provide support for the two most popular text encodings, the %%* OT1 and T1 encodings, and just accept the fact that any font not %%* using one of those encodings will be rendered as gibberish. %%* %%* OT1 is Knuth's 7-bit encoding for the CMR text fonts, while T1 %%* (aka the Cork encoding) is the 8-bit encoding used by the DC %%* fonts, a preliminary version of the proposed Extended Computer %%* Modern fonts. Unfortunately, T1 is not a strict extension of OT1; %%* they differ in positions 8#000 through 8#040, 8#074, 8#076, 8#134, %%* 8#137, 8#173, 8#174, 8#175 and 8#177, so we can't use the same %%* vector for both. %%* %%* Of course, we also can't reliably tell the difference between an %%* OT1-encoded font and a T1-encoded font based on the information in %%* a dvips-created PostScript file. As a best-guess solution, we'll %%* use the T1 encoding if the font contains any characters in %%* positions above 8#177 and the OT1 encoding if it doesn't. /T1Encoding 256 array def /OT1Encoding 256 array def %%* T1Encoding shares a lot with StandardEncoding, so let's start %%* there. StandardEncoding T1Encoding copy pop /OT1.encode { counttomark 2 idiv { OT1Encoding 3 1 roll put } repeat cleartomark } def /T1.encode { counttomark 2 idiv { T1Encoding 3 1 roll put } repeat cleartomark } def mark 8#000 /grave 8#001 /acute 8#002 /circumflex 8#003 /tilde 8#004 /dieresis 8#005 /hungarumlaut 8#006 /ring 8#007 /caron 8#010 /breve 8#011 /macron 8#012 /dotaccent 8#013 /cedilla 8#014 /ogonek 8#015 /quotesinglbase 8#016 /guilsinglleft 8#017 /guilsinglright 8#020 /quotedblleft 8#021 /quotedblright 8#022 /quotedblbase 8#023 /guillemotleft 8#024 /guillemotright 8#025 /endash 8#026 /emdash 8#027 /cwm 8#030 /perthousandzero 8#031 /dotlessi 8#032 /dotlessj 8#033 /ff 8#034 /fi 8#035 /fl 8#036 /ffi 8#037 /ffl %% 8#040 through 8#176 follow StandardEncoding 8#177 /hyphen T1.encode mark 8#200 /Abreve 8#201 /Aogonek 8#202 /Cacute 8#203 /Ccaron 8#204 /Dcaron 8#205 /Ecaron 8#206 /Eogonek 8#207 /Gbreve 8#210 /Lacute 8#211 /Lcaron 8#212 /Lslash 8#213 /Nacute 8#214 /Ncaron 8#215 /Eng 8#216 /Ohungarumlaut 8#217 /Racute 8#220 /Rcaron 8#221 /Sacute 8#222 /Scaron 8#223 /Scedilla 8#224 /Tcaron 8#225 /Tcedilla 8#226 /Uhungarumlaut 8#227 /Uring 8#230 /Ydieresis 8#231 /Zacute 8#232 /Zcaron 8#233 /Zdot 8#234 /IJ 8#235 /Idot 8#236 /dbar 8#237 /section 8#240 /abreve 8#241 /aogonek 8#242 /cacute 8#243 /ccaron 8#244 /dcaron 8#245 /ecaron 8#246 /eogonek 8#247 /gbreve 8#250 /lacute 8#251 /lcaron 8#252 /lslash 8#253 /nacute 8#254 /ncaron 8#255 /eng 8#256 /ohungarumlaut 8#257 /racute 8#260 /rcaron 8#261 /sacute 8#262 /scaron 8#263 /scedilla 8#264 /tcaron 8#265 /tcedilla 8#266 /uhungarumlaut 8#267 /uring 8#270 /ydieresis 8#271 /zacute 8#272 /zcaron 8#273 /zdot 8#274 /ij 8#275 /exclamdown 8#276 /questiondown 8#277 /sterling 8#300 /Agrave 8#301 /Aacute 8#302 /Acircumflex 8#303 /Atilde 8#304 /Adieresis 8#305 /Aring 8#306 /AE 8#307 /Ccedilla 8#310 /Egrave 8#311 /Eacute 8#312 /Ecircumflex 8#313 /Edieresis 8#314 /Igrave 8#315 /Iacute 8#316 /Icircumflex 8#317 /Idieresis 8#320 /Eth 8#321 /Ntilde 8#322 /Ograve 8#323 /Oacute 8#324 /Ocircumflex 8#325 /Otilde 8#326 /Odieresis 8#327 /OE 8#330 /Oslash 8#331 /Ugrave 8#332 /Uacute 8#333 /Ucircumflex 8#334 /Udieresis 8#335 /Yacute 8#336 /Thorn 8#337 /Germandbls 8#340 /agrave 8#341 /aacute 8#342 /acircumflex 8#343 /atilde 8#344 /adieresis 8#345 /aring 8#346 /ae 8#347 /ccedilla 8#350 /egrave 8#351 /eacute 8#352 /ecircumflex 8#353 /edieresis 8#354 /igrave 8#355 /iacute 8#356 /icircumflex 8#357 /idieresis 8#360 /eth 8#361 /ntilde 8#362 /ograve 8#363 /oacute 8#364 /ocircumflex 8#365 /otilde 8#366 /odieresis 8#367 /oe 8#370 /oslash 8#371 /ugrave 8#372 /uacute 8#373 /ucircumflex 8#374 /udieresis 8#375 /yacute 8#376 /thorn 8#377 /germandbls T1.encode %%* Now copy OT1Encoding into T1Encoding and make a few changes. T1Encoding OT1Encoding copy pop mark 8#000 /Gamma 8#001 /Delta 8#002 /Theta 8#003 /Lambda 8#004 /Xi 8#005 /Pi 8#006 /Sigma 8#007 /Upsilon 8#010 /Phi 8#011 /Psi 8#012 /Omega 8#013 /ff 8#014 /fi 8#015 /fl 8#016 /ffi 8#017 /ffl 8#020 /dotlessi 8#021 /dotlessj 8#022 /grave 8#023 /acute 8#024 /caron 8#025 /breve 8#026 /macron 8#027 /ring 8#030 /cedilla 8#031 /germandbls 8#032 /ae 8#033 /oe 8#034 /oslash 8#035 /AE 8#036 /OE 8#037 /Oslash 8#040 /polishslash 8#042 /quotedblright 8#074 /exclamdown 8#076 /questiondown 8#134 /quotedblleft 8#137 /dotaccent 8#173 /endash 8#174 /emdash 8#175 /hungarumlaut 8#177 /dieresis OT1.encode %%* And add a few characters from the OT1Encoding mark /Gamma (\\Gamma ) /Delta (\\Delta ) /Theta (\\Theta ) /Lambda (\\Lambda ) /Xi (\\Xi ) /Pi (\\Pi ) /Sigma (\\Sigma ) /Upsilon (\\Upsilon ) /Phi (\\Phi ) /Psi (\\Psi ) /Omega (\\Omega ) /dotlessj (j) /ff (ff) /cwm () /perthousandzero (0) /polishslash () /Abreve (A*) /Aogonek (A,) /Cacute (C') /Ccaron (C^) /Dcaron (D^) /Ecaron (E^) /Eogonek (E,) /Gbreve (G*) /Lacute (L') /Lcaron (L^) /Nacute (N') /Ncaron (N^) /Eng (NG) /Ohungarumlaut (O"") /Racute (R') /Rcaron (R^) /Sacute (S') /Scaron (S^) /Scedilla (S,) /Tcaron (T^) /Tcedilla (T,) /Uhungarumlaut (U"") /Uring (U*) /Ydieresis (Y") /Zacute (Z') /Zcaron (Z^) /Zdot (Z.) /IJ (IJ) /Idot (I.) /dbar (d-) /abreve (a*) /aogonek (a,) /cacute (c') /ccaron (c^) /dcaron (d^) /ecaron (e^) /eogonek (e,) /gbreve (g*) /lacute (l') /lcaron (l^) /nacute (n') /ncaron (n^) /eng (ng) /ohungarumlaut (o"") /racute (r') /rcaron (r^) /sacute (s') /scaron (s^) /scedilla (s,) /tcaron (t^) /tcedilla (t,) /uhungarumlaut (u"") /uring (u*) /zacute (z') /zcaron (z^) /zdot (z.) /ij (ij) /Germandbls (SS) .chars.def %%* We extend the df-tail command to stick in an Encoding vector (see %%* above for a discussion of the T1 and OT1 encodings), put in a %%* FontName (which will just be dvips's name for the font, i.e., Fa, %%* Fb, etc.) and give each font a separate FontBBox instead of %%* letting them all share a single one. /dvips.df-tail % id numcc maxcc df-tail { /nn 9 dict N nn begin %% %% Choose an encoding based on the highest position occupied. %% dup 128 gt { T1Encoding } { OT1Encoding } ifelse /Encoding X /FontType 3 N %% %% It's ok for all the fonts to share a FontMatrix, but they %% need to have separate FontBBoxes %% /FontMatrix fntrx N /FontBBox [0 0 0 0] N string /base X array /BitMaps X %% %% And let's throw in a FontName for good measure %% dup ( ) cvs %% %% Make sure each font gets it own private FontName. -- dmj, %% 12/23/97 %% dup length string copy /FontName X /BuildChar {CharBuilder} N end dup { /foo setfont } 2 array copy cvx N load 0 nn put /ctr 0 N [ } def %%* This is functionally equivalent to dvips's /D procedure, but it %%* also calculates the Font Bounding Box while defining the %%* characters. /dvips.D % char-data ch D - : define character bitmap in current font { /cc X % char-data dup type /stringtype ne {]} if % char-data /ch-xoff where { pop } { dup /Cd exch def /ch-width { Cw } def /ch-height { Ch } def /ch-xoff { Cx } def /ch-yoff { Cy } def /ch-dx { Cdx } def } ifelse /ch-data X nn /base get cc ctr put % (adds ctr to cc'th position of BASE) nn /BitMaps get ctr ch-data % BitMaps ctr char-data sf 1 ne { dup dup length 1 sub dup 2 index S get sf div put } if put % puts char-data into BitMaps at index ctr /ctr ctr 1 add N %% %% Make sure the Font Bounding Box encloses the Bounding Box of the %% current character %% nn /FontBBox get % BB dup % calculate new llx dup 0 get ch-xoff .min 0 exch put dup % calculate new lly dup 1 get ch-yoff ch-height sub .min 1 exch put dup % calculate new urx dup 2 get ch-dx ch-width add .max 2 exch put dup 3 get % calculate new ury ch-yoff .max 3 exch put } def %%* Define start-hook to replace df-tail and D by our versions. %%* Unfortunately, the user can redefine start-hook and thus bypass %%* these changes, but I don't see an obvious way around that. userdict /start-hook { TeXDict /df-tail /dvips.df-tail load bind put TeXDict /D /dvips.D load bind put } put %%* Introduce a symbolic constant for hyphens. (Need to make %%* allowance for hyphen being in different place?) /.hyphen 45 def % Write out a string. If it ends in a letter and a hyphen, % don't write the hyphen, and set .show.last to a hyphen; % otherwise, set .show.last to the character (or \000 if it was a hyphen). /.show.write % { dup length 1 ge { dup dup length 1 sub get % string last_char dup .hyphen eq % string last_char hyphen? { % string last_char 1 index length 1 gt { 1 index dup length 2 sub get } { //.show.last 0 get } ifelse % string last_char prev-char currentfont /Encoding get exch get % look up prev-char //.letter.chars exch known % is it a letter? { % Remove the hyphen % string last_char exch % last_char string dup length 1 sub % last_char string len-1 0 exch getinterval % last_char string-1 exch % string-1 last_char } { pop 0 } % string 0 ifelse } if //.show.last 0 3 -1 roll put % store last_char % in .show.last % If .show.last == % hyphen, then % last char of % previous string % was a hyphen } if % string currentfont /FontType get 0 ne { { % begin forall % c dup % c c currentfont /Encoding get % c c vec exch get % c name dup //.char.map exch known % c name bool { exch pop } { pop OT1Encoding exch get } ifelse % name //.char.map exch get % translation .show.stdout exch writestring } forall } { (\0) dup 0 get 0 eq { 0 1 put (%stderr) (w) file dup (*** Warning: composite font characters dumped without decoding.\n) writestring closefile } { pop } ifelse .show.stdout exch writestring } ifelse } odef /.showstring1 { % string currentpoint .coord % string x y 3 -1 roll dup .showwidth % x y string dx dy 1 index % x y string dx dy dx 0 rmoveto % x y string dx dy .dcoord pop % x y string width SIMPLE { % x y string width 2 index % x y string width y //.show.y .iget % x y string width y old.y %%* %%* Replaced test "has y changed" by "has y changed by more %%* than the current font height" so that subscripts and %%* superscripts won't cause line/paragraph breaks %%* sub abs dup % x y string width dy dy //.show.height .iget gt { % x y string width dy %%* Vertical position has changed by more than the font %%* height, so we now try to figure out whether we've %%* started a new paragraph or merely a new line, using a %%* variety of heuristics. %%* If any of the following is true, we start a new %%* paragraph: %%* (a) the current vertical shift is more than 1.1 times %%* the previous vertical shift, where 1.1 is an %%* arbitrarily chosen factor that could probably be %%* refined. dup % x y string width dy dy //.show.dy .iget 1.1 mul gt exch %%* Save the new vertical shift //.show.dy exch .iput %%* (b) The vertical shift is more than 1.3 times the %%* "size" of the current font. I've removed this %%* test since it's not really very useful. %%* //.show.dy .iget %%* //.show.height .iget 1.4 mul %%* gt % x y string width bool %%* .show.height .iget 0 gt and % only perform test if font %%* % height is nonzero %%* or %%* (c) the first character of the new line is one of the %%* .break.chars 2 index length % x y string width newpar? len 0 gt % x y string width newpar? len>0? { 2 index 0 get % x y string width newpar? s currentfont /Encoding get exch get % x y string width newpar? s_enc //.break.chars exch known { pop true } if } if % x y string width newpar? %%* (d) The indentation of the new line is greater than %%* the indentation of the previous line. 4 index //.show.indent .iget gt or %%* HOWEVER, if the line ends in a hyphen, we do NOT begin %%* a new paragraph (cf. comment at end of BF2). --dmj, %%* 12/23/97 //.show.last 0 get .hyphen ne and % newpar? { (\n\n) } % Paragraph { % Line %%* %%* BF2: If last character on a line is %%* a hyphen, we omit the hyphen and %%* run the lines together. Of %%* course, this will fail if a word %%* with an explicit hyphen (e.g., %%* X-ray) is split across two lines. %%* Oh, well. (What should we do %%* about a hyphen that ends a %%* "paragraph"? Perhaps that should %%* inhibit a paragraph break.) %%* //.show.last 0 get .hyphen eq { () } { ( ) } ifelse % x y string width char } ifelse //print //.show.y 3 index .iput % x y string width //.show.x 4 index .iput % x y string width //.show.indent 4 index .iput } { % x y string width dy % If the word processor split a hyphenated word within % the same line, put out the hyphen now. pop //.show.last 0 get .hyphen eq { (-) //print } if } ifelse %%* %%* If have moved more than 1 point to %%* the right, interpret it as a %%* space? This need to be looked at %%* more closely. %%* 3 index % x y string width x //.show.x .iget 10 add gt % x y string width bool { ( ) //print } if % x y string width 4 1 roll % width x y string .show.write pop % width x add //.show.x exch .iput % } { (S ) //print .show==4 } ifelse } odef /.showstring { dup () eq { pop } { .showstring1 } ifelse } bind def % Redefine all the string display operators. /show { .showfont .showcolor .showstring } codef % We define all the other operators in terms of .show1. /.show1.string ( ) def /.show1 { //.show1.string exch 0 exch put //.show1.string .showstring } odef /ashow { .showfont .showcolor { .show1 2 copy rmoveto } forall pop pop } codef /awidthshow { .showfont .showcolor { dup .show1 4 index eq { 4 index 4 index rmoveto } if 2 copy rmoveto } forall pop pop pop pop pop } codef /widthshow { .showfont .showcolor //.show1.string 0 4 -1 roll put { //.show1.string search not { exit } if .showstring .showstring 2 index 2 index rmoveto } loop .showstring pop pop } codef /kshow { .showfont .showcolor %**************** Should construct a closure, in case the procedure %**************** affects the o-stack. { .show1 dup exec } forall pop } codef % We don't really do the right thing with the Level 2 show operators, % but we do something semi-reasonable. /xshow { pop show } codef /yshow { pop show } codef /xyshow { pop show } codef /glyphshow { currentfont /Encoding .knownget not { {} } if 0 1 2 index length 1 sub { % Stack: glyph encoding index 2 copy get 3 index eq { exch pop exch pop null exit } if pop } for null eq { (X) dup 0 4 -1 roll put show } { pop } ifelse } codef end % Bind the operators we just defined, and all the others if we didn't % do it before. DELAYBIND { .bindnow } if % Make systemdict read-only if it wasn't already. systemdict wcheck { systemdict readonly pop } if % Restore the current local/global VM mode. exec