可冰

冰,是沉睡着的水......

  C++博客 :: 首页 :: 新随笔 :: 联系 :: 聚合  :: 管理 ::
  37 随笔 :: 5 文章 :: 94 评论 :: 0 Trackbacks

UTF-8
  • Inital EF BB BF is a signature, indicating that the rest of the file is UTF-8.
  • Any EF BF BE is an error.
  • A real ZWNBSP at the start of a file requires a signature first.
UTF-8N
  • All of the text is normal UTF-8; there is no signature.
  • Inital EF BB BF is a ZWNBSP.
  • Any EF BF BE is an error.
UTF-16
  • Initial FE FF is a signature indicating the rest of the text is big endian UTF-16.
  • Initial FF FE is a signature indicating the rest of the text is little endian UTF-16.
  • If neither of these are present, all of the text is big endian.
  • A real ZWNBSP at the start of a file requires a signature first.
UTF-16BE
  • All of the text is big endian: there is no signature.
  • Initial FE FF is a ZWNBSP.
  • Any FF FE is an error.
UTF-16LE
  • All of the text is little endian: there is no signature.
  • Initial FF FE is a ZWNBSP.
  • Any FE FF is an error.
UTF-32
  • Initial 00 00 FE FF is a signature indicating the rest of the text is big endian UTF-32.
  • Initial FF FE 00 00 is a signature indicating the rest of the text is little endian UTF-32.
  • If neither of these are present, all of the text is big endian.
  • A real ZWNBSP at the start of a file requires a signature first.
UTF-32BE
  • All of the text is big endian: there is no signature.
  • Initial 00 00 FE FF is a ZWNBSP.
  • Any FF FE 00 00 is an error.
UTF-32LE
  • All of the text is little endian: there is no signature.
  • Initial FF FE 00 00 is a ZWNBSP.
  • Initial 00 00 FE FF is an error.
Note: The italicized names are not yet registered, but are useful for reference.
[from: http://icu.sourceforge.net/docs/papers/forms_of_unicode/]
posted on 2005-09-19 15:23 可冰 阅读(327) 评论(0)  编辑 收藏 引用 所属分类: UTF-8

只有注册用户登录后才能发表评论。
网站导航: 博客园   IT新闻   BlogJava   知识库   博问   管理