可冰 - C++博客

关于模板的类型参数

看了cpunion写的IDL的代码,我知道了这样的用法:
在模板参数中,类型参数可以这样构造:
template_class< type( type1, type2, ... ) > a_class;
比如,可以void( void ), void(), void( int ), 也可以int( void ), string( int )等等,编译器是将它们当作不同的类型的来处理的.对此,我写了一些代码作了一下测试(见文末).但我也仅仅是有一个感性的认识而已,对于其为什么可以这样(因为从未见哪本书上介绍过这样的用法),我一点也不知道.
希望大家帮我释疑,也希望cpunion来帮我一下,谢谢!

#include <iostream>

typedef void(*fun)(int);

using namespace std;

template< typename T >

struct Base

{

void test()

{

cout << "Base" << "\t=\t";

cout << "Base<" << typeid(T).name() << ">" << endl;

}

};

template<>

struct Base < void >

{

void test()

{

cout << "Base" << endl;

}

};

template<>

struct Base < void( int ) >

{

void test()

{

cout << "Base" << endl;

}

};

template<>

struct Base < fun >

{

void test()

{

cout << "Base" << endl;

}

};

template<>

struct Base < int( string, int, char ) >

{

void test()

{

cout << "Base" << endl;

}

};

int main(int argc, char* argv[])

{

Base< void > b_void;

Base< void( int ) > b_void_int;

b_void.test();

b_void_int.test();

Base< int( string, int, char ) > b_int;

Base< fun > b_fun;

b_int.test();

b_fun.test();

Base< Base< void > ( Base < int ( string, int, char ) > ) > b_complex;

b_complex.test();

return 0;

}

posted @ 2005-09-29 19:51 可冰阅读(2254) | 评论 (9) | 编辑收藏

<<神话>>片尾曲词

隔世缠绵今世难续缘
心依然梦里的诺言未改变

人间凄然
不如一笑去了烛花散
尘花醉酒
轮回前尘成云烟

霓红点点凡间迷乱情浅
如当年幻梦间牵手
心依然

因果皆缘
不如一醉罢了浮花散
情语醉人轮回前尘成誓言

剑如飞心如水
也隔不断相思泪
歌不悔心还醉
究竟是为谁

爱若苦心无顾
谁拿爱情一生赌
翅断了碟儿飞了
化作一世深缘故

[来源:http://msn.ynet.com/Events.jsp?eid=6367298]

posted @ 2005-09-25 10:26 可冰阅读(374) | 评论 (0) | 编辑收藏

error C2899: 不能在模板声明之外使用类型名称 ?!!

前天碰到一个问题,当时想着挺纳闷的,不知道是什么原因.对"不能在模板声明之外使用类型名称"这样的提示你会想到是什么?我在无意中按F1键看到MSDN中的描述才明白是typename关键字用错了,是看它的英文描述才知道的:"typename cannot be used outside a template declaration".真想不到typename会翻译为类型名称.看来,以后有莫名其妙的错误还是得看英文的帮助文档啊,不过最好一开始就有英文版的VS.NET.
以下是具体的描述:

namespace code

{

enum CodeType { UTF_8, UNICODE };

template< CodeType srcT, CodeType desT >

struct ConvertType{};

template<>

struct ConvertType < UTF_8, UNICODE >

{

typedef char srcType;

typedef wchar_t desType;

};

template< CodeType srcT, CodeType desT >

struct Convert {};

template<>

struct Convert< UTF_8, UNICODE >

{

//error C2899: 不能在模板声明之外使用类型名称

typedef typename ConvertType< UTF_8, UNICODE >::srcType srcType; //!

typedef typename ConvertType< UTF_8, UNICODE >::desType desType; //!

};

} //namespace code

/*

这里根本不需要typename.

typename除用在模板声明中外,只能用于说明模板类的成员是一个类型.

例如:

template class X {};

// Another way

template struct X {

typedef double DoubleType;

typename X::DoubleType a; // T::A is a type

};

而如果不是模板类,则不能用typename.这时,它并不是多余的,而是一定不能要的.

例如:

template<> struct X< X > {

typename X::DoubleType a; //Error! X is not a generic class

X::DoubleType b; //OK!

};

我前面的代码也是这样的情况,ConvertType< UTF_8, UNICODE >已经是一个具体的类了,不要是模板类,所以ConvertType< UTF_8, UNICODE >::srcType前不能加typename.

*/

posted @ 2005-09-24 15:49 可冰阅读(9017) | 评论 (6) | 编辑收藏

构思UTF-8解码模块

想实现一个解码UTF-8格式文档为Unicode格式代码的"引擎",要用起来方便顺手.
但想了几天了,都没有一个合适的方案来实现.
唉......
今天先试着写了写,找找感觉,接着再想吧...

posted @ 2005-09-22 23:24 可冰阅读(1064) | 评论 (1) | 编辑收藏

std::wfstream是怎么支持宽字符的?

std::wfstream的定义为:
typedef basic_fstream<wchar_t, char_traits<wchar_t> > wfstream;
在读取字符时:
wfstream wfile( "wcharfile.txt" );
wchar_t wch = wfile.get();
按语义讲应该是读入两个字节内容的.但经输出检测,它却只读入一个字节,这样和fstream还有什么分别?
到底在处理Unicode编码的文件时,应该如何使用宽字符流?

posted @ 2005-09-22 22:47 可冰阅读(2893) | 评论 (4) | 编辑收藏

"这是一个UTF-8格式的文档!"的几种不同编码表示

posted @ 2005-09-20 20:39 可冰阅读(495) | 评论 (1) | 编辑收藏

UTF-8 编码格式总结

[以下只是个人的总结,如若有误,恳请指正,谢谢!]
下列字节串用来表示一个字符. 用到哪个串取决于该字符在 Unicode 中的序号.

U+00000000 - U+0000007F:	0 xxxxxxx	0x - 7x
U+00000080 - U+000007FF:	110 xxxxx 10 xxxxxx	Cx 8x - Dx Bx
U+00000800 - U+0000FFFF:	1110 xxxx 10 xxxxxx 10 xxxxxx	Ex 8x 8x - Ex Bx Bx
U+00010000 - U+001FFFFF:	11110 xxx 10 xxxxxx 10 xxxxxx 10 xxxxxx	F0 8x 8x 8x - F7 Bx Bx Bx	很少用
U+00200000 - U+03FFFFFF:	111110 xx 10 xxxxxx 10 xxxxxx 10 xxxxxx 10 xxxxxx	F8 8x 8x 8x 8x - FB Bx Bx Bx Bx
U+04000000 - U+7FFFFFFF:	1111110 x 10 xxxxxx 10 xxxxxx 10 xxxxxx 10 xxxxxx 10 xxxxxx	FC 8x 8x 8x 8x 8x - FD Bx Bx Bx Bx Bx

* FE FF从未在编码中出现过.
* 除第一个字节外,其余字节都在 0x80 到 0xBF范围内,每个字符的起始位置用0xC0-0xD0,0xE0,0xF0等可以确定(验证前四位或八位),不在这一范围的即为单字节字符.凡是以0x80 到 0xBF开头的都是后继字节,计数时都要跳过.
* Unicode是一种编码表,只将字符指定给某一数字(Unicode做得还要更多一些,比如提供比较及显示等很多算法等等);
而UTF-8是编码方式,是定义如何表示并存储指定编码的格式.
* UTF-8编码转换为Unicode编码: 将所有标志位去除,剩余位数若不足则在高位补零,凑足32位即可.
* Unicode编码转换为UTF-8编码: 从低位开始,每取6位补两个位10,不足6位(不算高位的0)则按字节长度补相应的字符标志位0、110、1110等

posted @ 2005-09-19 20:03 可冰阅读(10374) | 评论 (3) | 编辑收藏

UTF types

UTF	Estimated average storage required per page (3000 characters)
UTF-8	3 KB (1999) 5 KB (2003)	On average, English takes slightly over one unit per code point. Most Latin-script languages take about 1.1 bytes. Greek, Russian, Arabic and Hebrew take about 1.7 bytes, and most others (including Japanese, Chinese, Korean and Hindi) take about 3 bytes. Characters in surrogate space take 4 bytes, but as a proportion of all world text they will always be very rare.
UTF-16	6 KB	All of the most common characters in use for all modern writing systems are already represented with 2 bytes. Characters in surrogate space take 4 bytes, but as a proportion of all world text they will always be very rare.
UTF-32	12 KB	All take 4 bytes

[来源: http://icu.sourceforge.net/docs/papers/forms_of_unicode/]

UTF-8(ISO 10646-1) 有以下特性:

UCS 字符 U+0000 到 U+007F (ASCII) 被编码为字节 0x00 到 0x7F (ASCII 兼容). 这意味着只包含 7 位 ASCII 字符的文件在 ASCII 和 UTF-8 两种编码方式下是一样的.
所有 > U+007F 的 UCS 字符被编码为一个或多个字节的串, 每个字节都有标记位集. 因此, ASCII 字节 (0x00-0x7F) 不可能作为任何其他字符的一部分.
表示非 ASCII 字符的多字节串的第一个字节总是在 0xC0 到 0xFD 的范围里, 并指出这个字符包含多少个字节. 多字节串的其余字节都在 0x80 到 0xBF 范围里. 这使得重新同步非常容易, 并使编码无国界, 且很少受丢失字节的影响.
可以编入所有可能的 2³¹个 UCS 代码
UTF-8 编码字符理论上可以最多到 6 个字节长, 然而 16 位 BMP 字符最多只用到 3 字节长.
Bigendian UCS-4 字节串的排列顺序是预定的.
字节 0xFE 和 0xFF 在 UTF-8 编码中从未用到.

下列字节串用来表示一个字符. 用到哪个串取决于该字符在 Unicode 中的序号.

U-00000000 - U-0000007F:	0xxxxxxx
U-00000080 - U-000007FF:	110xxxxx 10xxxxxx
U-00000800 - U-0000FFFF:	1110xxxx 10xxxxxx 10xxxxxx
U-00010000 - U-001FFFFF:	11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
U-00200000 - U-03FFFFFF:	111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
U-04000000 - U-7FFFFFFF:	1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

xxx 的位置由字符编码数的二进制表示的位填入. 越靠右的 x 具有越少的特殊意义. 只用最短的那个足够表达一个字符编码数的多字节串. 注意在多字节串中, 第一个字节的开头"1"的数目就是整个串中字节的数目.

例如: Unicode 字符 U+00A9 = 1010 1001 (版权符号) 在 UTF-8 里的编码为:

11000010 10101001 = 0xC2 0xA9

而字符 U+2260 = 0010 0010 0110 0000 (不等于) 编码为:

11100010 10001001 10100000 = 0xE2 0x89 0xA0

这种编码的官方名字拼写为 UTF-8, 其中 UTF 代表 UCS Transformation Format. 请勿在任何文档中用其他名字 (比如 utf8 或 UTF_8) 来表示 UTF-8, 当然除非你指的是一个变量名而不是这种编码本身.

什么编程语言支持 Unicode?

在大约 1993 年之后开发的大多数现代编程语言都有一个特别的数据类型, 叫做 Unicode/ISO 10646-1 字符. 在 Ada95 中叫 Wide_Character, 在 Java 中叫 char.

ISO C 也详细说明了处理多字节编码和宽字符 (wide characters) 的机制, 1994 年 9 月 Amendment 1 to ISO C 发表时又加入了更多. 这些机制主要是为各类东亚编码而设计的, 它们比处理 UCS 所需的要健壮得多. UTF-8 是 ISO C 标准调用多字节字符串的编码的一个例子, wchar_t 类型可以用来存放 Unicode 字符.
[来源: http://www.linuxforum.net/books/UTF-8-Unicode.html]

posted @ 2005-09-19 15:38 可冰阅读(382) | 评论 (0) | 编辑收藏

UTF serializations

UTF-8	Inital `EF BB BF` is a signature, indicating that the rest of the file is UTF-8. Any `EF BF BE` is an error. A real ZWNBSP at the start of a file requires a signature first.
UTF-8N	All of the text is normal UTF-8; there is no signature. Inital `EF BB BF` is a ZWNBSP. Any `EF BF BE` is an error.
UTF-16	Initial `FE FF` is a signature indicating the rest of the text is big endian UTF-16. Initial `FF FE` is a signature indicating the rest of the text is little endian UTF-16. If neither of these are present, all of the text is big endian. A real ZWNBSP at the start of a file requires a signature first.
UTF-16BE	All of the text is big endian: there is no signature. Initial `FE FF` is a ZWNBSP. Any `FF FE` is an error.
UTF-16LE	All of the text is little endian: there is no signature. Initial `FF FE` is a ZWNBSP. Any `FE FF` is an error.
UTF-32	Initial `00 00 FE FF` is a signature indicating the rest of the text is big endian UTF-32. Initial `FF FE 00 00` is a signature indicating the rest of the text is little endian UTF-32. If neither of these are present, all of the text is big endian. A real ZWNBSP at the start of a file requires a signature first.
UTF-32BE	All of the text is big endian: there is no signature. Initial `00 00 FE FF` is a ZWNBSP. Any `FF FE 00 00` is an error.
UTF-32LE	All of the text is little endian: there is no signature. Initial `FF FE 00 00` is a ZWNBSP. Initial `00 00 FE FF` is an error.

Note: The italicized names are not yet registered, but are useful for reference.

[from: http://icu.sourceforge.net/docs/papers/forms_of_unicode/]

posted @ 2005-09-19 15:23 可冰阅读(346) | 评论 (0) | 编辑收藏

学汇编想到的一些问题

在汇编中,用CALL调用子程序时,处理器要保存当前的状态.但具体地来说,它会保存哪些寄存器的值呢?
首先保存的应该就是返回地址了吧,但这一过程可不可以用其它代码来显式的实现呢?也就是用push or mov等将它所做的工作代替,这样可能吗?

另外,C/C++中的局部变量是在哪里分配的呢?我记得好像是在堆上,但不太清楚了.这一过程在汇编中是如何实现的呢?看过了C的反汇编代码还是没搞清楚啊.

posted @ 2005-09-19 12:57 可冰阅读(482) | 评论 (4) | 编辑收藏

可冰

公告

常用链接

留言簿(7)

随笔分类(18)

随笔档案(37)

文章分类(5)

文章档案(5)

相册

朋友的博客

收藏

我的博客组

搜索

积分与排名

最新评论

阅读排行榜

评论排行榜

什么编程语言支持 Unicode?