.TH DOC2TXT 1 .SH NAME doc2txt, wdoc2txt, xls2txt, olefs, mswordstrings, msexceltable \- extract printable strings from Microsoft Office documents .SH SYNOPSIS .B doc2txt [ .I file.doc ] .br .B wdoc2txt [ .I file.doc ] .br .B xls2txt [ .I file.xls ] .br .B aux/olefs [ .B -m .I mtpt ] .I file.doc .br .B aux/mswordstrings .I /mnt/doc/WordDocument .br .B aux/msexceltable [ .B -aDnt ] [ .B -d .I delim ] .B -w .I worksheet-range ] .I /mnt/doc/Workbook .SH DESCRIPTION .I Doc2txt is an .IR rc (1) script that uses .I olefs and .I mswordstrings to extract the printable text from the body of a Microsoft Word document and write it on the standard output. .I Wdoc2txt is similar, but uses .IR plumb (1) to send the output to a new .IR acme (1) window instead. .I Xls2txt performs a similar function for Microsoft Excel documents. .PP Microsoft Office documents are stored in OLE (Object Linking and Embedding) format, which is a scaled down version of Microsoft's FAT file system. .I Olefs presents the contents of an Office document as a file system on .IR mtpt , which defaults to .BR /mnt/doc . .I Mswordstrings or .I msexceltables may then be used to parse the files inside, extracting a text stream. .I Msexceltables may be given options to control the formatting of its output. .TP .B -n Disables field padding to colum width. .TP .B -t Truncate fields to the colum width. .TP .B -a Attempt conversion of non-tabular sheets in the workbook. (charts). .TP .BI -d " delim Sets the interfield delimiter to the string .IR delim , by default a single space. .TP .B -D Enables debugging output. .TP .BI -w " worksheet-spec Specifies which worksheets to process, by default all tabular sheets are output \- suspressed chart pages are always included in the sheet count. Arbitary lists of pages or page ranges may be given, individual pages are seperated by commas, sheet ranges are seperated by a minus. .SH EXAMPLE .EX aux/olefs report.xls msexceltables -w 1,7,9-14,3-4 -n -d '@' /mnt/doc/Workbook unmount /mnt/doc .EE .SH SOURCE .B /rc/bin/doc2txt .br .B /rc/bin/wdoc2txt .br .B /rc/bin/xls2txt .br .B /sys/src/cmd/aux/msexceltables.c .br .B /sys/src/cmd/aux/mswordstrings.c .br .B /sys/src/cmd/aux/olefs.c .SH SEE ALSO .IR strings (1) .br ``Microsoft Word 97 Binary File Format'', available on line at Microsoft's developer home page. .br ``LAOLA Binary Structures'', .I http://snake.cs.tu-berlin.de:8081/~schwartz/pmh .br ``OpenOffice.Org's Excel Documentation'', .I http://sc.openoffice.org/excelfileformat.pdf