change character classification from unicode first to this priority: ascii, utf8, binary, latin1. use a private function to recognize utf8. these changes allow us to recognize 0x10ffff > utf > 0xffff and latin1. dbcs recognition is also possible; that code is deferred for a subsequent patch. the utf-8 range 0xa0-0xff is now called "latin". not "Extended Latin".