|
Tcs interprets the named file(s) (standard input default) as a
stream of characters from the ics character set or format, converts
them to runes, and then converts them into a stream of characters
from the ocs character set or format on the standard output. The
default value for ics and ocs is utf, the UTF encoding
described in utf(6). The –l option lists the character sets known
to tcs. Processing continues in the face of conversion errors
(the –s option prevents reporting of these errors). The –c option
forces the output to contain only correctly converted characters;
otherwise, Runeerror (0xFFFD) characters will be
substituted for UTF encoding errors and unknown characters.
The –v option generates various diagnostic and summary information
on standard error, or makes the –l output more verbose.
Tcs recognizes an ever changing list of character sets. In particular,
it supports a variety of Russian and Japanese encodings. Some
of the supported encodings are
utf The Plan 9 UTF encoding, known by ISO as UTF–8
utf1 The deprecated original UTF encoding from ISO 10646
ascii 7–bit ASCII
8859–1 Latin–1 (Central European)
8859–2 Latin–2 (Czech .. Slovak)
8859–3 Latin–3 (Dutch .. Turkish)
8859–4 Latin–4 (Scandinavian)
8859–5 Part 5 (Cyrillic)
8859–6 Part 6 (Arabic)
8859–7 Part 7 (Greek)
8859–8 Part 8 (Hebrew)
8859–9 Latin–5 (Finnish .. Portuguese)
html Unicode as encoded by HTML
koi8 KOI–8 (GOST 19769–74)
jis–kanji ISO 2022–JP
ujis EUC–JX: JIS 0208
ms–kanji Microsoft, or Shift–JIS
jis (from only) guesses between ISO 2022–JP, EUC or Shift–Jis
gb Chinese national standard (GB2312–80)
big5 Big 5 (HKU version)
unicode Unicode Standard 1.0
tis Thai character set plus ASCII (TIS 620–1986)
msdos IBM PC: CP 437
atari Atari–ST character set
nfd Unicode Normalization Form D
nfc Unicode Normalization Form C
|