KIRARA(7) NAME mkall, mkdb, mkdbL, extwords, istext, mktoc, db.conf, kirara.conf SYNOPSIS mkall mkdb [-b] [-r] [-u] [-L] [-C] mkdbL extwords [-l $minlen] [-adhino] $path ... istext [-deh] $path ... mktoc [-l $minlen] [-q] [index] db.conf kirara.conf DESCRIPTION These are back-end of database Kirara (ver.1.4 and ver.2.1). We refer Kirara1 to ver.1.x Kirara and Kirara2 to ver.2.x Kirara, and in the explanation below, we stand $kirara as $kirara1 and $kirara2 that are environment variables for Kirara1 and Kirara2. These environment valiables show the location of Kirara1 and Kirara2. Mkall and mkdb are rc scripts that create database. Recommented location of these program are: $kirara/^(mkall kirara.conf) $kirara/sysdb/^(mkdb db.conf) $kirara/usrdb/^(mkdb db.conf) You may have as many database as you like in $kirara with the name $kirara/*db. You need have environment variables $kirara1 for Kiarar1 and $kirara2 for Kirara2. Recommended location for cwfs is: /n/other/kirara1 for Kirara1 /n/other/kirara2 for Kirara2 and if your fs is fossil, kfs is better place: /n/kfs/kirara1 for Kirara1 /n/kfs/kirara2 for Kirara2 If you want to install Kirara in other directory, you need to change environment variable kirara1 and kirara2 in these scripts to the directory. Look INSTALL for details. Look the value $kirara1 (or $kirara2) and $wordlen in kirara.conf and also look the value $target and $exclude in db.conf. These are configuration variables. We have $minlen option in extwords and mktoc. This option determines the minimum word length to be indexed. You must use same value to both extwords and mktoc. The value $wordlen is passed to extwords and mktoc as option $minlen. The value minlen=2 might be too small. Probably you think this value produces so verbose output. Then it is better to chose minlen=3. Configuration for $target is defined in db.conf target=(/lib /sys/lib /sys/src /sys/man /sys/include /sys/doc /rc) and $exclude is regular expression pattern defined in kirara.conf exclude='\.(zip|tar|tgz|gz|z|Z|iso|bz2|png|jpg|jpeg|tif|tiff|gif|pdf|ps|hqx|[soi8])$' Since binary files are automatically excluded in indexing, all of these listed extentions are not needed. You may add more files to exclude. For example, I have exclude='^/lib/dict/|'$exclude in sysdb, because dictionary files is large and have almost all words in it. If you use eventlog you also need eventlog=/sys/log/elnfs. You should have your own configuration. Options to mkdb is -b # build from scratch -r # update the dirs and the subdirs using $target -r $dir ... # update the dirs and the subdirs -u $dir ... # update the dirs (only the dirs) -d $dir ... # delete the dirs in $target -L # update using event log -C # check database -S # get some statistics MkdbL is a rc script that updates all database based on eventlog. The recommented location is $home/bin/rc. Mkdb internally uses: lr in this package. mktoc in this package. extwords in this package. 9xa in this package. ffm in this package. mkkind in this package. Extwords extracts words from files. A word is a string that consists of alpha-numerics and underscores. The "alpha" means Western Eulopean runes such as English and many other languages. Look MAN_KFIND for details. Capital letters are converted to lower case. The options are: -l $minlen # minimum length of the words -d # the paths are directories -i # contents to lower case -h # header name is last element of path -hh # no header name -a # add header info to contents (for indexing) # not recommended -o # add line offset -n # numerics at the beginning are removed in extractiong words NB: -a option is not reliable for files that include special symbols such as *,&,$, etc in the name. Kirara does not allow such symbols into index. So the use of -a option may produce unexpected result. Word length is more than or equal to 2. This value ($minlen) is configurable by the option "-l". The default value is 2. The option "-d" works as follows: extwords $path/* is same as extwords -d $path The option "-a" denotes to add the header information as content. The option "-h" denotes the level of minuteness as shown by examples: term% extwords /sys/doc/* /sys/doc/8½: /sys/doc/9.html: xml version encoding ... term% extwords -h /sys/doc/* 8½: 9.html: xml version encoding ... term% extwords -hh /sys/doc/* xml version encoding ... The option "-o" is used only for making dictionary file. Look MAN_MKDICT for details. The option "-n" is provided for compatibility to older version, which extract, for example, p2000 from 9p2000. (Note that leading numerics are discarded.) Extwords evaluates the rate of errors after reading 100 runes. If the rate exceeds 5%, then Extwords stops to proceed and exits. Istext examines files in the arguments and reports only text files. It reads first 1000 runes for the determination. Istext is used in G1. (see KFIND(1)) The options are: -d # the argument is a directory -h # report only names (not full path) -a # read to eof and report the statistics Once you have finished preparation. cd $kirara && mkall will construct all that are needed. This will take a long time. For example, 30 minutes for my case: 1782 directories in sysdb 4868 directories in usrdb After building database you will find some files: main mtoc and for Kirara1, qtd qd and for Kirara2, you will see qtf qf In addition you will have qdir/*/ where 'qdir/*'s denote directries such as qdir/0000000000014f05 qdir/0000000000014f0a ... In these directories, you have database for large directories (for Kirara1) or for large files (for Kirara2). Note that the words indexed in main are words produce by extwords. Thus they are aphanumerics (if the language is English). Mktoc makes meta index (table of contents) of main. The options are: -q # quiet output -l $minlen # the smallest word length in the index. (default 2) You will find files that begin with "_". These are temporal files that are created in executing mkdb. They may be removed. Updating will not take so much time. The time depends how many files are updated. In case that a few directories are updated, Mkdb requireds 30 sec to 2 minutes for my case. Elnfs creates event log which makes updating faster. To enable elnfs, you need /sys/log/elnfs. Probably you are host owner. My case is. a-rw-rw-r-- M 24 arisawa sys 1397 Jul 14 06:41 elnfs where arisawa is the host owner of the system. Note that the owner of elnfs is not sys but host owner, because mkdb executes: chmod -a elnfs && >elfs && chmod +a elnfs If your fs is cwfs64x, run the commands below before you start Rio. cd / && elnfs /usr For other fs, look MAN_ELNFS in this package. The example logs are as follows: maia Jul 13 19:54:45 wopen: ./arisawa/src/worddb/MAN maia Jul 13 20:17:06 create: ./arisawa/src/worddb/README maia Jul 13 20:46:42 wopen: ./arisawa/src/worddb/README maia Jul 14 06:35:39 remove: ./arisawa/doc/prog/a.c maia Jul 14 06:35:39 remove: ./arisawa/doc/prog/mkfile maia Jul 14 06:35:39 remove: ./arisawa/doc/prog/ maia Jul 14 06:41:23 rename: ./arisawa/db/ updatedb UNINSTALLATION Kirara writes only in the directories under $kirara1 or $kirara2 Thus, if you don't need Kirara, simply remove all contents in the directory. SEE ALSO KFIND(1) # MAN_KFIND ELNFS(4) # MAN_ELNFS LNFS(4) # MAN_LNFS BUGS (a) Kiarar1 Due to mtime bug of cwfs, mkdb -r may not update correctly. Cwfs does not update mtime of directory even if file contents is modified. To let directory change mtime, we need to touch the directory. (Or creation and deletion of file have same effect.) Ken fs and kfs also have the same bug. Fossil does not have this bug. Mkdb with option "-L" is free from this problem. Mtime bug should be fixed. (b) Kirara1 Due to mtime bug of cwfs, mkdb -C # check integrity of the database may lie about the integrity. That is, even if some data in database remain old, the database may pass the check. (c) Both Kiarar1 and Kirara2 Updating based on event log file /sys/log/elnfs has some problems: Updating files in $home/bin/rc will not be logged to elnfs if you edit those files through /bin. That is, event log may miss changes due to name space reconfiguation. Plan9 is a distributed operating system. Therefore files may be changed on the terminal. Then you need to run elnfs also on the terminal. In conclusion, "fs event" should be directly supported by file systems. SOURCE http://plan9.aichi-u.ac.jp/netlib/kirara/ AUTHER Kenji Arisawa arisawa@aichi-u.ac.jp