[bash源码分析] 3 语法分析 - 入口点 - 糯米

[bash源码分析] 3 语法分析 - 入口点

语法分析 - 入口点

--- main()
    我们打开shell.c的main函数，大概300来行，其主题都是围绕这xxx_init，做各种初始化操作。
    我们可以略过不看，等遇到问题的时候再说。把目光放到最后一句 reader_loop()。这是一个循环读
    入并执行命令的函数。

--- reader_loop()
    位于eval.c的reader_loop()函数，其中仿佛只有调用read_command()是重点。

--- read_command()
    同样位于eval.c的read_command()函数。一开始那一段ALARM信号的处理让人觉得很费解，难道
    在bash输入命令还要有时间限制吗？无论如何，这种看似偏门的、非关键性的东西，在代码分析的初期
    是不能理会的，如果太深究这些东西，没有把握代码的主线，则会走入死胡同，而且会失去源码分析
    的乐趣。
    代码主线走入parse_command()函数。

--- parse_command()
    同样位于eval.c的parse_command()函数。它调用的yyparse()函数是语法分析的开始。
    用过yacc的人很明白这一点了。一开始我们看到文件列表中有y.tab.c这样的文件，就能意识到bash也是
    利用yacc生成的代码来完成语法分析的。

--- Yacc的作用
    你只要告诉yacc三样东西：语法、每一条语法的处理函数、负责词法分析的函数
    yacc就会为你生成y.tab.c文件，只要调用这个文件中的yyparse()函数，就可以完成编译器的
    词法分析和语法分析的部分了。在分析的过程中，你刚刚指定的每一条语法对应的处理函数也会
    被调用。关于yacc的具体介绍，可以在网上搜搜，很多的。

    例子：
    告诉yacc：语法和对应的处理函数。
    expr : expr '+' expr { $$ = add($1, $3) }
         | expr '*' expr { $$ = mul($1, $3) }
         | expr '-' expr { $$ = sub($1, $3) }
         | NUMBER
          ;
    调用yyparse()，输入 1 + 2
    add(1, 2) 就会被回调了
    在处理函数中 $$ 代表着处理函数的返回值
    $1 代表着该条语法中的第一个元素(expr)
    $2 代表着该条语法中的第二个元素('+')
    $3 代表着该条语法中的第三个元素(expr)
    至于说这些元素的类型，则会在前面定义。比如 %type<char *> expr 之类。
    具体的还是找篇文章看看吧。

--- parse.y
    观察Makefile可以发现：
    y.tab.c y.tab.h: parse.y
        $(YACC) -d $(srcdir)/parse.y
    y.tab.c是由parse.y生成的。而parse.y中包含了语法和对应的处理函数，它是语法分析的核心文件。

    首先是一个%union定义
    %union {
        WORD_DESC *word;        /* the word that we read. */
        int number;            /* the number that we read. */
        WORD_LIST *word_list;
        COMMAND *command;
        REDIRECT *redirect;
        ELEMENT element;
        PATTERN_LIST *pattern;
    }

    然后是一系列的token定义：

/* Reserved words. Members of the first group are only recognized
   in the case that they are preceded by a list_terminator. Members
   of the second group are for [[...]] commands. Members of the
   third group are recognized only under special circumstances. */
%token IF THEN ELSE ELIF FI CASE ESAC FOR SELECT WHILE UNTIL DO DONE FUNCTION
%token COND_START COND_END COND_ERROR
%token IN BANG TIME TIMEOPT

/* More general tokens. yylex () knows how to make these. */
%token <word> WORD ASSIGNMENT_WORD
%token <number> NUMBER
%token <word_list> ARITH_CMD ARITH_FOR_EXPRS
%token <command> COND_CMD
%token AND_AND OR_OR GREATER_GREATER LESS_LESS LESS_AND LESS_LESS_LESS
%token GREATER_AND SEMI_SEMI LESS_LESS_MINUS AND_GREATER LESS_GREATER
%token GREATER_BAR

    读入字符串流，返回token是词法分析函数的责任。
    以%token定义，表明返回值是int类型
    以%token <word>定义，表明返回值是%union中对应的类型

    词法分析函数是lex生成的，但这个工程好像把原始的
    .lex文件删除了。我们只能看到生成后的yylex()函数。
    但有一个表，可以看出token对应的字串内容：

/* Reserved words. These are only recognized as the first word of a
   command. */
STRING_INT_ALIST word_token_alist[] = {
{ "if", IF },
{ "then", THEN },
{ "else", ELSE },
{ "elif", ELIF },
{ "fi", FI },
{ "case", CASE },
{ "esac", ESAC },
{ "for", FOR },
#if defined (SELECT_COMMAND)
{ "select", SELECT },
#endif
{ "while", WHILE },
{ "until", UNTIL },
{ "do", DO },
{ "done", DONE },
{ "in", IN },
{ "function", FUNCTION },
#if defined (COMMAND_TIMING)
{ "time", TIME },
#endif
{ "{", '{' },
{ "}", '}' },
{ "!", BANG },
#if defined (COND_COMMAND)
{ "[[", COND_START },
{ "]]", COND_END },
#endif
{ (char *)NULL, 0}
};

/* other tokens that can be returned by read_token() */
STRING_INT_ALIST other_token_alist[] = {
/* Multiple-character tokens with special values */
{ "-p", TIMEOPT },
{ "&&", AND_AND },
{ "||", OR_OR },
{ ">>", GREATER_GREATER },
{ "<<", LESS_LESS },
{ "<&", LESS_AND },
{ ">&", GREATER_AND },
{ ";;", SEMI_SEMI },
{ "<<-", LESS_LESS_MINUS },
{ "<<<", LESS_LESS_LESS },
{ "&>", AND_GREATER },
{ "<>", LESS_GREATER },
{ ">|", GREATER_BAR },
{ "EOF", yacc_EOF },
/* Tokens whose value is the character itself */
{ ">", '>' },
{ "<", '<' },
{ "-", '-' },
{ "{", '{' },
{ "}", '}' },
{ ";", ';' },
{ "(", '(' },
{ ")", ')' },
{ "|", '|' },
{ "&", '&' },
{ "newline", '\n' },
{ (char *)NULL, 0}
};

/* others not listed here:
    WORD            look at yylval.word
    ASSIGNMENT_WORD        look at yylval.word
    NUMBER            look at yylval.number
    ARITH_CMD        look at yylval.word_list
    ARITH_FOR_EXPRS        look at yylval.word_list
    COND_CMD        look at yylval.command
*/

    这些token在语法中会遇到的。

    接下来是对语法中每一项内容（编译原理没学好，不知道这个术语叫什么。。）的定义：

/* The types that the various syntactical units return. */

%type <command> inputunit command pipeline pipeline_command
%type <command> list list0 list1 compound_list simple_list simple_list1
%type <command> simple_command shell_command
%type <command> for_command select_command case_command group_command
%type <command> arith_command
%type <command> cond_command
%type <command> arith_for_command
%type <command> function_def function_body if_command elif_clause subshell
%type <redirect> redirection redirection_list
%type <element> simple_command_element
%type <word_list> word_list pattern
%type <pattern> pattern_list case_clause_sequence case_clause
%type <number> timespec
%type <number> list_terminator

%start inputunit

    从名字上来看，大概能知道是作什么的。
    %start 表示整个语法分析的入口是 inputunit 那一项。
    接着就是语法了，内容就比较多，不直接贴了。
    语法是我比较感兴趣的地方，无论看哪本关于bash的书，都不如看代码来的直接，呵呵。
    我们以后慢慢看。

posted on 2010-07-25 10:19 糯米阅读(1312) 评论(0) 编辑收藏引用所属分类: Misc

只有注册用户登录后才能发表评论。


相关文章: 标记－清除（ Mark-Sweep ）算法 WTFPL - Do What The Fuck You Want To Public License [转]Stairway to Heaven 歌词分析别问这有什么用---蔡康永 [bash源码分析] 4 语法分析 - 后台运行、管道、重定向 [bash源码分析] 3 语法分析 - 入口点 [bash源码分析] 2 寻找入口点 [bash源码分析] 1 目的和意义 [转]很有感觉的一篇文章我们的纯真与失落

网站导航: 博客园 IT新闻 BlogJava 博问 Chat2DB 管理

糯米

[bash源码分析] 3 语法分析 - 入口点

导航

随笔分类

Links

最新评论

阅读排行榜