老鱼头昨天向俺们推荐了 Ragel State Machine Compiler 这个东东,一个可以生成协议处理代码的工具。还举了个例子,简简单单的几行代码:
int atoi( char *str )
{
char *p = str;
int cs, val = 0;
bool neg = false;
%%{
action see_neg {
neg = true;
}
action add_digit {
val = val * 10 + (fc - '0');
}
main :=
( '-'@see_neg | '+' )? ( digit @add_digit )+
'\n' @{ fbreak; };
# Initialize and execute.
write init;
write exec noend;
}%%
if ( neg )
val = -1 * val;
if ( cs < atoi_first_final )
cerr << "atoi: there was an error" << endl;
return val;
};
生成的基于状态机的代码,比c里面那500多行实现的atoi函数更加高效。比如上面这段代码,会生成下面的c代码:
int atoi( char *str )
{
char *p = str;
int cs, val = 0;
bool neg = false;
#line 27 "atoi.c"
{
cs = atoi_start;
}
#line 31 "atoi.c"
{
switch ( cs )
{
case 1:
switch( (*p) ) {
case 43: goto st2;
case 45: goto tr2;
}
if ( 48 <= (*p) && (*p) <= 57 )
goto tr3;
goto st0;
st0:
goto _out0;
tr2:
#line 23 "atoi.rl"
{
neg = true;
}
goto st2;
st2:
p += 1;
case 2:
#line 52 "atoi.c"
if ( 48 <= (*p) && (*p) <= 57 )
goto tr3;
goto st0;
tr3:
#line 27 "atoi.rl"
{
val = val * 10 + ((*p) - '0');
}
goto st3;
st3:
p += 1;
case 3:
#line 63 "atoi.c"
if ( (*p) == 10 )
goto tr4;
if ( 48 <= (*p) && (*p) <= 57 )
goto tr3;
goto st0;
tr4:
#line 33 "atoi.rl"
{ goto _out4; }
goto st4;
st4:
p += 1;
case 4:
#line 74 "atoi.c"
goto st0;
}
_out0: cs = 0; goto _out;
_out4: cs = 4; goto _out;
_out: {}
}
#line 38 "atoi.rl"
if ( neg )
val = -1 * val;
if ( cs < atoi_first_final )
cerr << "atoi: there was an error" << endl;
return val;
};
他说,Nginx里面花了好几百行来实现HTTP协议的解析,用Ragel,100多行就可以搞定了,效率更高,人肉优化器不值钱了(参见网址上面的 http11_parser.rl 代码)
今天试了一下,用来写一个判断一个Java String是否为数字串:
public class IsInt
{
%%{
machine is_int;
write data noerror;
}%%
public static void main(String[] args)
{
long begin = System.currentTimeMillis();
for (int i=0; i<100000000; i++) {
isIntStr("123456789p");
isIntStr("8487389247");
}
System.out.println(System.currentTimeMillis() - begin);
begin = System.currentTimeMillis();
for (int i=0; i<100000000; i++) {
isAllNumber("123456789p");
isAllNumber("8487389247");
}
System.out.println(System.currentTimeMillis() - begin);
}
public static boolean isAllNumber(String str)
{
char[] c = str.toCharArray();
boolean blReturn = true;
for(int ni=0; ni<c.length; ni++)
{
if(c[ni]<48 || c[ni]>57)
{
blReturn = false;
break;
}
}
return blReturn;
}
public static boolean isIntStr(String str)
{
char[] data = str.toCharArray();
int p=0, cs=0;
boolean isInt = true;
%%{
main := (digit+)? any @{ isInt = false; fbreak; };
write init;
write exec noend;
}%%
return isInt;
}
}
使用 ragel.exe -J IsInt.rl | rlgen-java.exe 命令生成 java 代码,编译运行,结果是:
27750
30938
可见生成的代码比那简单实现的更高:)
在RoR架构上面使用的Mongrel服务器,原来也是使用了Ragel