change character classification from unicode first to this priority:
ascii, utf8, binary, latin1.  use a private function to recognize utf8.
these changes allow us to recognize 0x10ffff > utf > 0xffff and latin1.
dbcs recognition is also possible; that code is deferred for a subsequent
patch.

the utf-8 range 0xa0-0xff is now called "latin". not "Extended Latin".