If you happen to have a copy of the OED2 on CD, decompress.c may or may not help. Usage: {dump 0x0 0x4; dump 0x40 0x44} < OED2.DAT | hd The first number is your start offset less 0x8000, the second your end offset. dump 0x(first+8000) 0xsecond < OED2.DAT | decompress > OED2.sgml The rest is all in dict with some scripting to take mkindex output and make it into an actual index. Tested with sha1sum: 626fab18cc9a25feafcf4080901c834e3ca05af7 /n/oed/OED2.DAT