MKDICT(7) NAME mkdict lkdict SYNOPSIS mkdict [-p $prefix] [-F $sep] $file lkdict [-Q] [-d $dir] [-H $header] [-n] [-S $nsep] $query $dic [$text] DESCRIPTION Mkdict create dictionary named "dict:$file" using full inverted index, and lkdict is a tool to lookup the dictionary. Mkdict and lkdict is developed so that I can use a dictionary named "Eijiro" on Plan9. These two tools are designed to handle large text file. Note that the size is as much as 200MB. So grep takes long time (if not cached well). Eijiro is not a free dictionary. If you wand the dictionary, you need pay for that. Mkdict and lkdict are designed for any text file. The file need not be formatted for dictionary. term% t=eiji-138.txt term% mv $t $t.orig # rename original file term% tcs -f ms-kanji $t.orig | tr -d \x0d | KR_trans u > $t where u is a file of contents: # hex ef:bf:bd '' Note that the "ef:bf:bd" is RUNEERROR. (We need to remove rune error in text file.) term% cd $home/dic term% extwords1 -iohhl2 eiji-138.txt |sort -u +0 -1 +1n >eijiro term% mkdict eijiro term% mktoc -l2 dict:eijiro > indx:eijiro term% ls -lt --rw-rw-r-- M 153 arisawa arisawa 2132650 Sep 1 10:13 indx:eijiro --rw-r--r-- M 153 arisawa arisawa 29714492 Sep 1 10:02 list:eijiro --rw-r--r-- M 153 arisawa arisawa 30784953 Sep 1 10:02 dict:eijiro --rw-rw-r-- M 153 arisawa arisawa 228594575 Sep 1 10:00 eijiro --rw-rw-r-- M 153 arisawa arisawa 210405926 Aug 29 22:06 eiji-138.txt --rw-rw-rw- M 153 arisawa arisawa 164426072 Apr 28 04:26 eiji-138.txt.orig term% After this, you may remove the file "eijiro". The format of $query for lkdict is query := orlist query := orlist " " query where "orlist" is orlist := re orlist := re "|" orlist The "re" is regular expression for the words in dictionary. If the text is written in English, the words are alpha numeric that begin with alpha. This is a rule of variable name of computer language, but different in that they are all lower case characters. Therefore capital character should not be included, and special character that are unused in regular expression also should net be included. There is an additional rule: the first two character of words must be same as that of the word to be retrived. Example of query "foo" "foo$" "foo bat|baz" "foo bar$|baz" "fo[abc].*y" Lkdict will lookup these terms in a single line. $query is a query to the dictionary, $dir is the dictionary name and $text is the path to original text file. The options are: -p $prefix # prefix to "dict:", "list:" and "indx:". -d $dir # directory of the dictionary. -Q # input/output is QID format (%%016llx). -n # mark lines with the line number. -H $header # mark lines with $header. -S $nsep # string that separate line number and matched text. (default ":") Rc script "kdict" is provided to make "lkdict" more user friendly. Look aslo MAN_KDICT. AUTHER Kenji Arisawa