Morya

【翻译】ANTLR 3

ANTLR 3

by
R. Mark Volkmann, Partner/Software Engineer
Object Computing, Inc. (OCI)
翻译者:Morya

Preface

前言

ANTLR is a big topic, so this is a big article. The table of contents that follows contains hyperlinks to allow easy navigation to the many topics discussed. Topics are introduced in the order in which understanding them is essential to the example code that follows. Your questions and feedback are welcomed at mark@ociweb.com.

ANTLR 是一个很大的话题,所以,这篇也有点长。 下面这个列表里面包含了一些链接,它们指向本页其它位置,以方便浏览。 各个专题以一个在看过示例后能迅速理解的方式排列介绍。 有任何问题或者反馈,都欢迎致信 mark@ociweb.com 。

Table of Contents

内容列表


Part I - Overview

Introduction to ANTLR

ANTLR is a free, open source parser generator tool that is used to implement "real" programming languages and domain-specific languages (DSLs). The name stands for ANother Tool for Language Recognition. Terence Parr, a professor at the University of San Francisco, implemented (in Java) and maintains it. It can be downloaded from http://www.antlr.org. This site also contains documentation, articles, examples, a Wiki and information about mailing lists.

ANTLR 是一个免费,开源的解析器生成工具,它被用来实现“真正的”编程语言,和特殊语法语言(DSLs)。 ANTLR是 ANother Tool for Language Recognition 的缩写。 圣弗朗西斯科大学教授Terence Parr,(用Java) 实现并维护着这个工具。 下载地址:http://www.antlr.org。 这个站点有相关的文档、文章、示例,邮件列表,还有一个维基。

ANTLR home page

Many people feel that ANTLR is easier to use than other, similar tools. One reason for this is the syntax it uses to express grammars. Another is the existence of a graphical grammar editor and debugger called ANTLRWorks. Jean Bovet, a former masters student at the University of San Francisco who worked with Terence, implemented (using Java Swing) and maintains it.

很多人都认为 ANTLR 比同类工具更具可用性。 其中一个原因在于它描述 grammar 的语法。 另一个是图形化的,可调试的 ANTLRWorks 文法编辑器的存在。 它由 Jean Bovet 使用 Java(Swing) 实现并维护。 他是在圣弗朗西斯科大学和Terence共事的一位former masters ?学生。

A brief word about conventions in this article... ANTLR grammar syntax makes frequent use of the characters [ ] and { }. When describing a placeholder we will use italics rather than surrounding it with { }. When describing something that's optional, we'll follow it with a question mark rather than surrounding it with [ ].

本文使用的标记符转换简略介绍... ANTLR 文法文件的语法,对 [ ] 和 { } 使用的比较频繁。 当描述一个占位符的时候,我们使用斜体字而不是把它用 { } 括起来。 描述可选部分的时候,我们使用 ? 后缀而不是用 [ ] 括起来。

ANTLR Overview

ANTLR uses Extended Backus-Naur (EBNF) grammars which can directly express optional and repeated elements. BNF grammars require a more verbose syntax to express these. EBNF grammars also support "subrules" which are parenthesized groups of elements.

ANTLR 使用 Extended Backus-Naur 扩展巴克斯标记式 (EBNF) 文法,它可以直接表述 “可选”, “重复”元素。而 BNF 文法则需要更繁琐的语法来表达。 EBNF 文法也支持括号包含的元素组的子规则。

ANTLR supports infinite lookahead for selecting the rule alternative that matches the portion of the input stream being evaluated. The technical way of stating this is that ANTLR supports LL(*). An LL(k) parser is a top-down parser that parses from left to right, constructs a leftmost derivation of the input and looks ahead k tokens when selecting between rule alternatives. The * means any number of lookahead tokens. Another type of parser, LR(k), is a bottom-up parser that parses from left to right and constructs a rightmost derivation of the input. LL parsers can't handle left-recursive rules so those must be avoided when writing ANTLR grammars. Most people find LL grammars easier to understand than LR grammars. See Wikipedia for a more detailed descriptions of LL and LR parsers.

当多个规则符合输入的一部分内容时,ANTLR 支持无穷前看,以消除歧义。 用术语说就是 ANTLR 支持 LL(*)。 LL(k) parser 是一个自顶向下 parser ,它从左到右解析, 构建一个输入的最左推导,当遇到多个规则选择时,前看n个词素来决定。 * 代表前看任一个词素。 另一种类型的parser,LR(k),自底向上 parser,从左到右解析,并且构建 一个输入的最右推导。 LL parsers 不能处理左递归规则,在写Antlr文法的时一定要避免。 多数人觉得 LL 文法比 LR 文法更容易理解。详细参考:维基百科 LLLR 解析器。

ANTLR supports three kinds of predicates that aid in resolving ambiguities. These allow rules that are not based strictly on input syntax.

ANTLR 支持三种断言来解决歧义。 它们允许不是严格基于输入语法的规则。

While ANTLR is implemented in Java, it generates code in many target languages including Java, Ruby, Python, C, C++, C# and Objective C.

虽然 ANTLR 使用 Java 编写,它支持多种目标语言,包括 Java, Ruby, Python, C, C++, C# 和 Objective C。

There are IDE plug-ins available for working with ANTLR inside IDEA and Eclipse, but not yet for NetBeans or other IDEs.

IDEA 和 Eclipse 有相关的插件来支持 ANTLR,NetBeans 等 IDE 暂时还没有。

Use Cases

用例?

There are three primary use cases for ANTLR.

ANTLR 有三种主要的使用方法。

The first is implementing "validators." These generate code that validates that input obeys grammar rules.

第一种是实现“验证器”。 它检验输入文本是否符合文法规定的规则。

The second is implementing "processors." These generate code that validates and processes input. They can perform calculations, update databases, read configuration files into runtime data structures, etc. Our Math example coming up is an example of a processor.

第二种是实现 “处理器”。 它检验并处理输入文本。 可以进行计算,更新数据库,读取配置文件到内存中,等。 我们后面的Math示例就是一个处理器的例子。

The third is implementing "translators." These generate code that validates and translates input into another format such as a programming language or bytecode.

第三种就是“翻译器”。 它验证输入,并将输入翻译成另一种格式,比如编程语言或字节码。

Later we'll discuss "actions" and "rewrite rules." It's useful to point out where these are used in the three use cases above. Grammars for validators don't use actions or rewrite rules. Grammars for processors use actions, but not rewrite rules. Grammars for translators use actions (containing printlns) and/or rewrite rules.

晚点我们会讨论 “动作” 和 “重写规则”。 当然,明确这三种在何种情况使用会利于理解。 验证器不使用 “动作”和 “重写规则”。 处理器使用“动作”,但不使用“重写规则”。 翻译器使用“动作”,可能使用“重写规则”。 (包含 printlns)

Other DSL Approaches

Dynamic languages like Ruby and Groovy can be used to implement many DSLs. However, when they are used, the DSLs have to live within the syntax rules of the language. For example, such DSLs often require dots between object references and method names, parameters separated by commas, and blocks of code surrounded by curly braces or do/end keywords. Using a tool like ANTLR to implement a DSL provides maximum control over the syntax of the DSL.

待翻译…… 我也不懂……

Definitions

Lexer
converts a stream of characters to a stream of tokens (ANTLR token objects know their start/stop character stream index, line number, index within the line, and more)
把字符流转换成词素流, (ANTLR 词素对性知道它们自己的 start/stop 索引,行号,行中位置,等)
Parser
processes a stream of tokens, possibly creating an AST
处理词素流输入,并生成AST(可选)
Abstract Syntax Tree (AST)
an intermediate tree representation of the parsed input that is simpler to process than the stream of tokens and can be efficiently processed multiple times
输入流的树表示格式,它比词素流更方便处理,且可以高效的多次处理。
Tree Parser
processes an AST
StringTemplate
a library that supports using templates with placeholders for outputting text (ex. Java source code)
一个支持占位符模板的库,用来输出文本(比如 Java 源文件)

An input character stream is feed into the lexer. The lexer converts this to a stream of tokens that is feed to the parser. The parser often constructs an AST which is fed to the tree parser. The tree parser processes the AST and optionally produces text output, possibly using StringTemplate.

字符流送到 lexer,lexer 将它们转换到词素流,然后送到 parser。 paser 常常建立一个 AST ,并送到tree parser。 tree paser 处理 AST ,可能还会使用字符串模板生成文本输出。

ANTLR flow

General Steps

The general steps involved in using ANTLR include the following.

使用 ANTLR 大致有以下几步。

  • Write the grammar using one or more files.
    A common approach is to use three grammar files, each focusing on a specific aspect of the processing. The first is the lexer grammar, which creates tokens from text input. The second is the parser grammar, which creates an AST from tokens. The third is the tree parser grammar, which processes an AST. This results in three relatively simple grammar files as opposed to one complex grammar file.
  • 把文法定义写入一个或多个文件中
    一个通常的做法是使用三个文法文件,每个单独处理一个方面。 第一个是扫描器定义,从字符输入建立词素流。 第二个是解析器定义,从词素流建立AST。 第三个是树解析器定义,处理AST输入。 这样会产生三个相关的,但简化的文法文件,而不是一个单独的复杂大文件。
  • Optionally write StringTemplate templates for producing output.
  • 【可选】为输出编写字符串模板
  • Debug the grammar using ANTLRWorks.
  • 使用 ANTLRWorks 调试文法
  • Generate classes from the grammar. These validate that text input conforms to the grammar and execute target language "actions" specified in the grammar.
  • 从文法定义生成相关的处理类 这些类会验证输入文本是否符合文法定义并执行目标语言文法中指定的“动作”。
  • Write an application that uses the the generated classes.
  • 使用生成的类完成完整程序。
  • Feed the application text that conforms to the grammar.
  • 给程序输入符合文法的文件。

Part II - Jumping In

Example Description

Enough background information, let's create a language!

现在已经有足够的背景信息,我们来创建一个语言吧!

Here's a list of features we want our language to have:

下面是对我们的语言的功能期待:

  • run on a file or interactively
  • 可以执行一个文件或者交互
  • get help using ? or help
  • 可以用 ?help 取得帮助
  • support a single data type: double
  • 支持double数据类型
  • assign values to variables using syntax like a = 3.14
  • 可以使用类似 a = 3.14 的语法对一个变量进行赋值
  • define polynomial functions using syntax like f(x) = 3x^2 - 4x + 2
  • 使用如下语法定义一个多项式函数 f(x) = 3x^2 - 4x + 2
  • print strings, numbers, variables and function evaluations using syntax like
    print "The value of f for " a " is " f(a)
  • 使用如下语法打印字符串,数值,变量,函数的求值
    print "The value of f for " a " is " f(a)
  • print the definition of a function and its derivative using syntax like
    print "The derivative of " f() " is " f'()
  • 使用如下语法打印一个函数的定义和导函数
    print "The derivative of " f() " is " f'()
  • list variables and functions using list variables and list functions
  • 使用如下语法列表变量和函数 list variableslist functions
  • add and subtract functions using syntax like h = f - g
    (note that the variables used in the functions do not have to match)
  • 使用如下的语法进行函数的加减 h = f - g
    (注意,函数中使用的变量不需要是一个变量实例)
  • exit using exit or quit
  • 使用 exitquit 退出程序

Here's some example input.

下面是输入示例

 

= 3.14
f(x) 
= 3x^2 - 4x + 2
print "The value of f for " a " is " f(a)
print "The derivative of " f() " is " f'()
list variables
list functions
g(y) 
= 2y^3 + 6y - 5
= f + g
print h()

 

Here's the output that would be produced.

下面是理论输出

The value of f for 3.14 is 19.0188
The derivative of f(x) 
= 3x^2 - 4x + 2 is f'(x) = 6x - 4
# of variables defined: 1
= 3.14
# of functions defined: 
1
f(x) 
= 3x^2 - 4x + 2
h(x) 
= 2x^3 + 3x^2 + 2x - 3

Here's the AST we'd like to produce for the input above, drawn by ANTLRWorks. It's split into three parts because the image is really wide. The "nil" root node is automatically supplied by ANTLR. Note the horizontal line under the "nil" root node that connects the three graphics. Nodes with uppercase names are "imaginary nodes" added for the purpose of grouping other nodes. We'll discuss those in more detail later.

这里是我们将会从输入生成的AST,使用 ANTLRWorks 绘出。 它被分成三块,因为图片太宽了。 根节点 "nil" 由 ANTLR 自动提供。 注意,根节点 "nil" 下面的竖线是连着三张图片的………… 名字大写的节点是 “虚拟节点”。 为把其它节点分类而加。 我们晚点会详细讨论。


example ast diagram, part 1


example ast diagram, part 2


example ast diagram, part 3

Important Classes

The diagram below shows the relationships between the most important classes used in this example.

下面的图片展示了本示例中用到的几个最重要的类之间的关系。

Important classes

Note the key in the upper-left corner of the diagram that distinguishes between classes provided by ANTLR, classes generated from our grammar by ANTLR, and classes we wrote manually.

注意,在图片左上角的 key,指出了 ANTLR 提供的类,文法定义自动生成的类,和我们自己写的类。

Grammar Syntax

The syntax of an ANTLR grammar is described below.

ANTLR grammar 文件的语法在下面描述

grammar-type? grammar grammar-name;
grammar-options?
token-spec?
attribute-scopes?
grammar-actions?
rule+

Comments in an ANTLR grammar use the same syntax as Java. There are three types of grammars: lexer, parser and tree. If a grammar-type isn't specified, it defaults to a combined lexer and parser. The name of the file containing the grammar must match the grammar-name and have a ".g" extension. The classes generated by ANTLR will contain a method for each rule in the grammar. Each of the elements of grammar syntax above will be discussed in the order they are needed to implement our Math language.

ANTLR grammar 使用和Java相同的注释语法。( /* bb */ //aa ) grammar有三种: lexer, parser 和 tree。 如果为指定 grammar-type ,默认为 lexer 和 parser 混合 grammar 。 包含 grammar 的文件,其文件名必须和 grammar-name 完全一致(注意大小写), 而且,扩展名为 ".g" 。 ANTLR 生成的类会为文法里的每一个规则生成一个对应的函数。 上面讨论的语法元素,在需要实现Math语言而用到的时候,会引入详细说明。(唉,组织不好)

Grammar Options

Grammar options include the following:

grammar option包含下面几个:

AST node type - ASTLabelType = CommonTree
This is used in grammars that create or parse ASTs. CommonTree is a provided class. It is also possible to use your own class to represent AST nodes.
在生成或解析 AST 的 grammars 中使用。 CommonTree 是 ANTLR 内置的一个类。 也可以使用自定义的类来表述 AST 节点。
infinite lookahead - backtrack = true
无限前看 - backtrack = true
This provides infinite lookahead for all rules. Parsing is slower with this on.
对所有的规则提供无限前看。 开启后解析速度会变慢。
limited lookahead - k = integer
有限前看 - k = integer
This limits lookahead to a given number of tokens.
设定前看 k 个词素
output type - output = AST | template
输出类型 - output = AST | template
Choose template when using the StringTemplate library.
Don't set this when not producing output or doing so with printlns in actions.
使用StringTemplate的话,就选择 template
如果不进行输出,或者使用 println 进行输出就不要设置此选项。
token vocabulary - tokenVocab = grammar-name
This allows one grammar file to use tokens defined in another (with lexer rules or a token spec.). It reads generated .token files.
允许读取另一个 grammar 文件定义的词素。 (with lexer rules or a token spec.). 它读取生成的 .token 文件。

Grammar options are specified using the following syntax. Note that quotes aren't needed around single word values.

Grammar option 使用如下语法来指定。 注意,引用(我也不晓得)不需要用单引号包含。

 

options {
  name 
= 'value';
  ...
}

 

Grammar Actions

Grammar actions add code to the generated code. There are three places where code can be added by a grammar action.

Grammar aciton 会在生成的代码里添加一些内容。 用 grammar action 可以添加代码到三个地方。

  1. Before the generated class definition:
    This is commonly used to specify a Java package name and import classes in other packages. The syntax for adding code here is @header { ... }. In a combined lexer/parser grammar, this only affects the generated parser class. To affect the generated lexer class, use @lexer::header { ... }.
  2. 在生成类的定义位置前:
    通常用于指定 Java package 或者 import 其它 package 里的 class。 使用语法 @header { ... }。 在混合的 lexer/parser grammar 内,这样只会影响生成的parser类。 要对 lexer 类也起作用,需要使用 @lexer::header { ... }
  3. Inside the generated class definition:
    This is commonly used to define constants, attributes and methods accessible to all rule methods in the generated classes. It can also be used to override methods in the superclasses of the generated classes.
    The syntax for adding code here is @members { ... }. In a combined lexer/parser grammar, this only affects the generated parser class. To affect the generated lexer class, use @lexer::members { ... }.
  4. 在生成类的定义内:
    通常用于定义常数,属性或者一些可以访问所有 rule 生成函数的方法。 也可以用来重载生成类超类的函数。
    使用语法 @members { ... }。 在混合的 lexer/parser grammar 内,这样只会影响生成的parser类。 要对 lexer 类也起作用,需要使用 @lexer::members { ... }
  5. Inside generated methods:
    The catch blocks for the try block in the methods generated for each rule can be customized. One use for this is to stop processing after the first error is encountered rathering than attempting to recover by skipping unrecognized tokens.
    The syntax for adding catch blocks is @rulecatch { catch-blocks }
  6. 在生成的函数内:
    每个规则函数的异常处理块都可以自定义。 其中一个用处就是,当遇到第一个错误的时候,就停止,而不是掠过无法识别的词素尝试恢复。
    使用语法 @rulecatch { catch-blocks }

Part III - Lexers

Lexer Rules

A lexer rule or token specification is needed for every kind of token to be processed by the parser grammar. The names of lexer rules must start with an uppercase letter and are typically all uppercase. A lexer rule can be associated with:

一个 parser 需要处理的 token 必须要有相应的规则或 token 规格。 laxer 规则名必须以大写字母开头,最好是按惯例全部大写。 一个 lexer 规则可有以下几种

  • a single literal string expected in the input
  • 期待一个单独的文本字符串
  • a selection of literal strings that may be found
  • 可选的文本字符串
  • a sequence of specific characters and ranges of characters using the cardinality indicators ?, * and +
  • 一串特定顺序的字符范围和数量指定符 ?, *,+

A lexer rule cannot be associated with a regular expression.

lexer rule 不能用 正则表达式 指定。

When the lexer chooses the next lexer rule to apply, it chooses the one that matches the most characters. If there is a tie then the one listed first is used, so order matters.

当 lexer 发现输入流符合同时符合两个 rule,它会选择匹配更长的。 如果有两个都匹配输入,先出现在 grammar 中的会被匹配。所以,顺序前后是有差别的。

A lexer rule can refer to other lexer rules. Often they reference "fragment" lexer rules. These do not result in creation of tokens and are only present to simplify the definition of other lexer rules. In the example ahead, LETTER and DIGIT are fragment lexer rules.

lexer rule 可以引用其它 lexer rule。 经常它们引用 "fragment" lexer rule。 那些 rule 不产生实际的 token 标记,它们只是为了简化定义 lexer rule 而存在。 在前面的 example,LETTER 和 DIGIT 是 fragment lexer rule。

Whitespace and Comments

Whitespace and comments in the input are handled in lexer rules. There are two common options for handling these: either throw them away or write them to a different "channel" that is not automatically inspected by the parser. To throw them away, use "skip();". To write them to the special "hidden" channel, use "$channel = HIDDEN;".

输入中的空白字符和注释在 lexer rule 里处理。 有两个常用的 option 来处理这些: 干脆丢弃,或者输入到另一个 parser 不会自动关注的 "channel"。 丢弃的话,使用 "skip();" 写入 "hidden" 通道: "$channel = HIDDEN;" 。

Here are examples of lexer rules that handle whitespace and comments.

下面是一些处理空白和注释的 lexer 规则例子。

 

// Send runs of space and tab characters to the hidden channel.
WHITESPACE: (' ' | '\t')+ { $channel 
= HIDDEN; };
// Treat runs of newline characters as a single NEWLINE token.
// On some platforms
, newlines are represented by a \n character.
// On others they are represented by a \r and a \n character.
NEWLINE: ('\r'? '\n')+
;
// Single-line comments begin with //, are followed by any characters
// other than those in a newline
, and are terminated by newline characters.
SINGLE_COMMENT: '//' ~('\r' | '\n')* NEWLINE { skip()
; };
// Multi-line comments are delimited by /* and */
// and are optionally followed by newline characters.
MULTI_COMMENT options { greedy 
= false; }
: '/*' .* '*/' NEWLINE? { skip(); };

 

When the greedy option is set to true, the lexer matches as much input as possible. When false, it stops when input matches the next element in the lexer rule. The greedy option defaults to true except when the patterns ".*" and ".+" are used. For this reason, it didn't need to be specified in the example above.

当开启贪婪模式的时候, lexer 会尽可能多的匹配字符。 关闭时,找到 lexer rule 里规定的下一个元素就会停止。 贪婪模式默认开启,除非使用了 ".*" 和 ".+" 模式。 因此,在上面的例子中,不需要再特别指定。

If newline characters are to be used as statement terminators then they shouldn't be skipped or hidden since the parser needs to see them.

如果换行作为描述块的结束符,不要丢弃或隐藏它,因为 parser 需要用到。

Our Lexer Grammar

lexer grammar MathLexer;
// We want the generated lexer class to be in this package.
@header { package com.ociweb.math; }
APOSTROPHE: 
'\''; // for derivative
ASSIGN: '=';
CARET: 
'^'// for exponentiation
FUNCTIONS: 'functions'// for list command
HELP: '?' | 'help';
LEFT_PAREN: 
'(';
LIST: 
'list';
PRINT: 
'print';
RIGHT_PAREN: 
')';
SIGN: 
'+' | '-';
VARIABLES: 
'variables'// for list command
NUMBER: INTEGER | FLOAT;
fragment FLOAT: INTEGER 
'.' '0'..'9'+;
fragment INTEGER: 
'0' | SIGN? '1'..'9' '0'..'9'*;
NAME: LETTER (LETTER 
| DIGIT | '_')*;
STRING_LITERAL: 
'"' NONCONTROL_CHAR* '"';
fragment NONCONTROL_CHAR: LETTER 
| DIGIT | SYMBOL | SPACE;
fragment LETTER: LOWER 
| UPPER;
fragment LOWER: 
'a'..'z';
fragment UPPER: 
'A'..'Z';
fragment DIGIT: 
'0'..'9';
fragment SPACE: 
' ' | '\t';
// Note that SYMBOL does not include the double-quote character.
fragment SYMBOL: '!' | '#'..'/' | ':'..'@' | '['..'`' | '{'..'~';
// Windows uses \r\n. UNIX and Mac OS X use \n.
// To use newlines as a terminator,
// they can't be written to the hidden channel!
NEWLINE: ('\r'? '\n')+;
WHITESPACE: SPACE
+ { $channel = HIDDEN; };

We'll be looking at the parser grammar soon. When parser rule alternatives contain literal strings, they are converted into references to automatically generated lexer rules. For example, we could eliminate the ASSIGN lexer rule above and change ASSIGN to '=' in the parser grammar.

我们马上就会看到 parser 的 grammar 。 当 parser 的 规则中包含文本字符串时,它们会被自动生成转换成 lexer rule ,然后再引用它。 比如,我们可以把上面例子里的 ASSIGN 删掉,然后把 parser grammar 里的 ASSIGN 换成 '='


Part IV - Parsers

Token Specifications

The lexer creates tokens for all input character sequences that match lexer rules. It can be useful to create other tokens that either don't exist in the input (imaginary) or have a better name than what is found in the input. Imaginary tokens are often used to group other tokens. In the parser grammar ahead, the tokens that play this role are DEFINE, POLYNOMIAL, TERM, FUNCTION, DERIVATIVE and COMBINE.

Lexer 会为所有符合 lexer rule 的输入字符流创建词素。 下面的做法会很有用处,为输入中并不存在的单元创建词素,或使用比输入中出现的更好理解的名字。 虚拟词素经常被用来合并其它的词素。 在后面的 parser grammar 中,扮演这种角色的有 DEFINE, POLYNOMIAL, TERM, FUNCTION, DERIVATIVE 和 COMBINE。

The syntax for specifying these kinds of tokens in a parser grammar is:

在 parser grammar 中指定这种类型的词素的语法如下:

 

tokens {
  imaginary
-name;
  better
-name = 'input-name';
}

 

Rule Syntax

The syntax for defining rules is

定义一个 rule 的语法

fragment? rule-name arguments?
(returns 
return-values)?
throws-spec?
rule
-options?
rule
-attribute-scopes?
rule
-actions?
  : token
-sequence-1
  
| token-sequence-2
  ...
  ;
exceptions
-spec?

The fragment keyword only appears at the beginning of lexer rules that are used as fragments (described earlier).

关键词 fragment 只会出现在 lexer 规则前面,它们会被作为 fragment (前面描述过了)。

Rule options include backtrack and k which customize those options for a specific rule instead of using the grammar-wide values specified as grammar options. They are specified using the syntax options { ... }.

规则配置里面包含 backtrackk 会把它们指定的规则 rule 用当前值来替代全局 grammar 级指定的值。它们使用下面的语法来指定 options { ... }

The token sequences are alternatives that can be selected by the rule. Each element in the sequences can be followed by an action which is target language code (such as Java) in curly braces. The code is executed immediately after a preceding element is matched by input.

词素顺序列被用来选择各个规则。序列中的每一个元素都可以附加一个 action,它被用{ } 包围的目标语言(比如 Java)编写。在之前的元素被输入流匹配之后该代码会立即被执行。

The optional exceptions-spec customizes exception handling for this rule.

可选的 exceptions-spec 此规则的异常处理。

Elements in a token sequence can be assigned to variables so they can be accessed in actions. To obtain the text value of a token that is referred to by a variable, use $variable.text. There are several examples of this in the parser grammar that follows.

一个词素中的元素可以被赋值给变量以利于它们可以被 action 访问。要读取一个变量指向词素的文本值,要使用 $variable.text。后面的几个例子中都有这种使用情况。

Creating ASTs

Parser grammars often create ASTs. To do this, the grammar option output must be set to AST.

Parser grammar 经常创建 AST。这样的话,grammar option output 必须被设置成 AST

There are two approaches for creating ASTs. The first is to use "rewrite rules". These appear after a rule alternative. This is the recommended approach in most cases. The syntax of a rewrite rule is
-> ^(parent child-1 child-2 ... child-n)

有两种办法来创建 AST。第一种是使用 "rewrite rules"。这种出现在一个规则的分支后面。这是大多数情况的推荐做法。Rewrite rule 语法 -> ^(parent child-1 child-2 ... child-n)

The second approach for creating ASTs is to use AST operators. These appear in a rule alternative, immediately after tokens. They work best for sequences like mathematical expressions. There are two AST operators. When a ^ is used, a new root node is created for all child nodes at the same level. When a ! is used, no node is created. This is often used for bits of syntax that aren't needed in the AST such as parentheses, commas and semicolons. When a token isn't followed by one of them, a new child node is created for that token using the current root node as its parent.

第二种创建AST语法树的办法是使用 AST 操作符。它们出现在一个规则的分支中间,紧跟词素后面。在数学表达式中最适用。 有两种 AST 操作符。使用 ^ 时,一个新的父(?不确定)节点会被创建给所用同一等级的子节点。使用 ! 时,不创建节点。这种经常用在一些不需要出现在 AST 中的语法片段,比如括号、逗号、和分号。当一个词素后面没有跟任何后缀时,默认创建一个当前根节点的子节点。

A rule can use both of these approaches, but each rule alternative can only use one approach.

一个规则可以同时使用两种方法,但对于一个规则分支只能使用一种。

Rule Arguments and Return Values

The following syntax is used to declare rule arguments and return types.

下面的语法用来声明规则参数和返回值。

rule-name[type1 name1, type2 name2, ...]
returns [type1 name1, type2 name2, ...] :
  ...
  ;

The names after the rule name are arguments and the names after the returns keyword are return values.

规则名后面的名字是参数类型和参数变量名,而 returns 关键字后面的则是返回值类型和返回值变量名。

Note that rules can return more than one value. ANTLR generates a class to use as the return type of the generated method for the rule. Instances of this class hold all the return values. The generated method name matches the rule name. The name of the generated return type class is the rule name with "_return" appended.

注意,规则可以返回多个值。 ANTLR 会生成一个类用来做该规则生成方法的返回值类型。该类的实例则持有所有返回值。规则生成的方法名,或称函数名和该规则名完全一致。 而生成的返回值类型名则会是规则名后加上"_return"。

我们的解析器定义如下

parser grammar MathParser;
options {
  
// We're going to output an AST.
  output = AST;
  
// We're going to use the tokens defined in our MathLexer grammar.
  tokenVocab = MathLexer;
}
// These are imaginary tokens that will serve as parent nodes
// for grouping other tokens in our AST.
tokens {
  COMBINE;
  DEFINE;
  DERIVATIVE;
  FUNCTION;
  POLYNOMIAL;
  TERM;
}
// We want the generated parser class to be in this package.
@header { package com.ociweb.math; }
// This is the "start rule".
// EOF is a predefined token that represents the end of input.
// The "start rule" should end with this.
// Note the use of the ! AST operator
// to avoid adding the EOF token to the AST.
script: statement* EOF!;
statement: assign 
| define | interactiveStatement | combine | print;
// These kinds of statements only need to be supported
// when reading input from the keyboard.
interactiveStatement: help | list;
// Examples of input that match this rule include
// "a = 19", "a = b", "a = f(2)" and "a = f(b)".
assign: NAME ASSIGN value terminator -> ^(ASSIGN NAME value);
value: NUMBER 
| NAME | functionEval;
// A parenthesized group in a rule alternative is called a "subrule".
// Examples of input that match this rule include "f(2)" and "f(b)".
functionEval
  : fn
=NAME LEFT_PAREN (v=NUMBER | v=NAME) RIGHT_PAREN -> ^(FUNCTION $fn $v);
// EOF cannot be used in lexer rules, so we made this a parser rule.
// EOF is needed here for interactive mode where each line entered ends in EOF
// and for file mode where the last line ends in EOF.
terminator: NEWLINE | EOF;
// Examples of input that match this rule include
// "f(x) = 3x^2 - 4" and "g(x) = y^2 - 2y + 1".
// Note that two parameters are passed to the polynomial rule.
define
  : fn
=NAME LEFT_PAREN fv=NAME RIGHT_PAREN ASSIGN
polynomial[$fn.text, $fv.text] terminator
-> ^(DEFINE $fn $fv polynomial);
// Examples of input that match this rule include
// "3x2 - 4" and "y^2 - 2y + 1".
// fnt = function name text; fvt = function variable text
// Note that two parameters are passed in each invocation of the term rule.
polynomial[String fnt, String fvt]
  : term[$fnt, $fvt] (SIGN term[$fnt, $fvt])
*
-> ^(POLYNOMIAL term (SIGN term)*);
// Examples of input that match this rule include
// "4", "4x", "x^2" and "4x^2".
// fnt = function name text; fvt = function variable text
term[String fnt, String fvt]
  
// tv = term variable
  : c=coefficient? (tv=NAME e=exponent?)?
    
// What follows is a validating semantic predicate.
    
// If it evaluates to false, a FailedPredicateException will be thrown.
    
// It is testing whether the term variable matches the function variable.
    { tv == null ? true : ($tv.text).equals($fvt) }?
    
-> ^(TERM $c? $tv? $e?)
   ;
   
// This catches bad function definitions such as
   
// f(x) = 2y
   catch [FailedPredicateException fpe] {
     String tvt 
= $tv.text;
     String msg 
= "In function \"" + fnt +
     "\" the term variable \"" + tvt +
     
"\" doesn't match function variable \"" + fvt + "\".";
     throw new RuntimeException(msg);
   }
coefficient: NUMBER;
// An example of input that matches this rule is "^2".
exponent: CARET NUMBER -> NUMBER;
// Inputs that match this rule are "?" and "help".
help: HELP terminator -> HELP;
// Inputs that match this rule include
// "list functions" and "list variables".
list
  : LIST listOption terminator 
-> ^(LIST listOption);
// Inputs that match this rule are "functions" and "variables".
listOption: FUNCTIONS | VARIABLES;
// Examples of input that match this rule include
// "h = f + g" and "h = f - g".
combine
  : fn1
=NAME ASSIGN fn2=NAME op=SIGN fn3=NAME terminator
-> ^(COMBINE $fn1 $op $fn2 $fn3);
// An example of input that matches this rule is
// print "f(" a ") = " f(a)
print
  : PRINT printTarget
* terminator -> ^(PRINT printTarget*);
// Examples of input that match this rule include
// 19, 3.14, "my text", a, f(), f(2), f(a) and f'().
printTarget
  : NUMBER 
-> NUMBER
  
| sl=STRING_LITERAL -> $sl
  
| NAME -> NAME
  
// This is a function reference to print a string representation.
  | NAME LEFT_PAREN RIGHT_PAREN -> ^(FUNCTION NAME)
  
| functionEval
  
| derivative
  ;
// An example of input that matches this rule is "f'()".
derivative
  : NAME APOSTROPHE LEFT_PAREN RIGHT_PAREN 
-> ^(DERIVATIVE NAME);

Part V - Tree Parsers

Part V - 树解析器

Rule Actions

Rule actions add code before and/or after the generated code in the method generated for a rule. They can be used for AOP-like wrapping of methods. The syntax @init { ...code... } inserts the contained code before the generated code. The syntax @after { ...code... } inserts the contained code after the generated code. The tree grammar rules polynomial and term ahead demonstrate using @init.

Rule Action 可以在自动生成的规则函数代码前或者后面加入自定义代码。 可以被用在 AOP 类似的包装函数中。 下面的语法 @init { ...code... } 会在生成代码前面加入其中包含的代码。 语法 @after { ...code... } 则会在生成代码后面加上其中包含的代码。 前面的树文法规则 polynomialterm 演示了 @init 的使用。

Attribute Scopes

属性域

Data is shared between rules in two ways: by passing parameters and/or returning values, or by using attributes. These are the same as the options for sharing data between Java methods in the same class. Attributes can be accessible to a single rule (using @init to declare them), a rule and all rules invoked by it (rule scope), or by all rules that request the named global scope of the attributes.

有两种办法在规则之间共享数据:传递参数或者返回值,或使用属性。和 Java 在同一个类内部共享数据的方案一样。 Attributes can be accessible to a single rule (using @init to declare them), a rule and all rules invoked by it (rule scope), or by all rules that request the named global scope of the attributes.

Attribute scopes define collections of attributes that can be accessed by multiple rules. There are two kinds, global and rule scopes.

属性域定义其它规则可以访问的各种属性。有两种全局和规则级的属性域。

Global scopes are named scopes that are defined outside any rule. To request access to a global scope within a rule, add scope name; to the rule. To access multiple global scopes, list their names separated by spaces. The following syntax is used to define a global scope.

全局属性域定义的属性在任何规则(函数)之外。在规则(函数)内访问全局属性,,需要给规则添加 scope name; 。 要访问多个全局属性域, 列出所有用空格分隔的名字。下面的语法用来定义一个全局属性域。

scope name {
  type variable;
  
}

Rule scopes are unnamed scopes that are defined inside a rule. Rule actions in the defining rule and rules invoked by it access attributes in the scope with $rule-name::variable . The following syntax is used to define a rule scope.

规则域是未命名的域,在规则内部定义。 当前定义的规则的 action 中和当前规则应用的规则,使用如下语法访问属性 $rule-name::variable 。 下面的语法用来定义一个规则域。

scope {
  type variable;
 

}

 

To initialize an attribute, use an @init rule action.

要初始化一个属性,使用规则 action @init

Our Tree Grammar

我们的树文法定义

tree grammar MathTree;
options {
  
// We're going to process an AST whose nodes are of type CommonTree.
  ASTLabelType = CommonTree;
  
// We're going to use the tokens defined in
  
// both our MathLexer and MathParser grammars.
  
// The MathParser grammar already includes
  
// the tokens defined in the MathLexer grammar.
  tokenVocab = MathParser;
}
@header {
  
// We want the generated parser class to be in this package.
  package com.ociweb.math;
  
import java.util.Map;
  
import java.util.TreeMap;
}
// We want to add some fields and methods to the generated class.
@members {
  
// We're using TreeMaps so the entries are sorted on their keys
  
// which is desired when listing them.
  private Map<String, Function> functionMap = new TreeMap<String, Function>();
  
private Map<String, Double> variableMap = new TreeMap<String, Double>();
  
// This adds a Function to our function Map.
  private void define(Function function) {
    functionMap.put(function.getName(), function);
  }
  
// This retrieves a Function from our function Map
  
// whose name matches the text of a given AST tree node.
  private Function getFunction(CommonTree nameNode) {
    String name 
= nameNode.getText();
    Function function 
= functionMap.get(name);
    
if (function == null) {
      String msg 
= "The function \"" + name + "\" is not defined.";
      
throw new RuntimeException(msg);
    }
    
return function;
  }
  
// This evaluates a function whose name matches the text
  
// of a given AST tree node for a given value.
  private double evalFunction(CommonTree nameNode, double value) {
    
return getFunction(nameNode).getValue(value);
  }
  
// This retrieves the value of a variable from our variable Map
  
// whose name matches the text of a given AST tree node.
  private double getVariable(CommonTree nameNode) {
    String name 
= nameNode.getText();
    Double value 
= variableMap.get(name);
    
if (value == null) {
      String msg 
= "The variable \"" + name + "\" is not set.";
      
throw new RuntimeException(msg);
    }
    
return value;
  }
  
// This just shortens the code for print calls.
  private static void out(Object obj) {
    System.out.print(obj);
  }
  
// This just shortens the code for println calls.
  private static void outln(Object obj) {
    System.out.println(obj);
  }
  
// This converts the text of a given AST node to a double.
  private double toDouble(CommonTree node) {
    
double value = 0.0;
    String text 
= node.getText();
    
try {
      value 
= Double.parseDouble(text);
    } 
catch (NumberFormatException e) {
      
throw new RuntimeException("Cannot convert \"" + text + "\" to a double.");
    }
    
return value;
  }
  
// This replaces all escaped newline characters in a String
  
// with unescaped newline characters.
  
// It is used to allow newline characters to be placed in
  
// literal Strings that are passed to the print command.
  private static String unescape(String text) {
    
return text.replaceAll("\\\\n""\n");
  }
// @members
script: statement*;
statement: assign 
| combine | define | interactiveStatement | print;
// These kinds of statements only need to be supported
// when reading input from the keyboard.
interactiveStatement: help | list;
// This adds a variable to the map.
// Parts of rule alternatives can be assigned to variables (ex. v)
// that are used to refer to them in rule actions.
// Alternatively rule names (ex. NAME) can be used.
// We could have used $value in place of $v below.
assign: ^(ASSIGN NAME v=value) { variableMap.put($NAME.text, $v.result); };
// This returns a value as a double.
// The value can be a number, a variable name or a function evaluation.
value returns [double result]
  : NUMBER { $result 
= toDouble($NUMBER); }
  
| NAME { $result = getVariable($NAME); }
  
| functionEval { $result = $functionEval.result; }
  ;
// This returns the result of a function evaluation as a double.
functionEval returns [double result]
  : 
^(FUNCTION fn=NAME v=NUMBER) {
      $result 
= evalFunction($fn, toDouble($v));
    }
  
| ^(FUNCTION fn=NAME v=NAME) {
      $result 
= evalFunction($fn, getVariable($v));
    }
  ;
// This builds a Function object and adds it to the function map.
define
  : 
^(DEFINE name=NAME variable=NAME polynomial) {
    define(
new Function($name.text, $variable.text, $polynomial.result));
  }
  ;
// This builds a Polynomial object and returns it.
polynomial returns [Polynomial result]
// The "current" attribute in this rule scope is visible to
// rules invoked by this one, such as term.
scope { Polynomial current; }
@init { $polynomial::current 
= new Polynomial(); }
  
// There can be no sign in front of the first term,
  
// so "" is passed to the term rule.
  
// The coefficient of the first term can be negative.
  
// The sign between terms is passed to
  
// subsequent invocations of the term rule.
  : ^(POLYNOMIAL term[""] (s=SIGN term[$s.text])*) {
    $result 
= $polynomial::current;
  }
  ;
// This builds a Term object and adds it to the current Polynomial.
term[String sign]
@init { 
boolean negate = "-".equals(sign); }
  : 
^(TERM coefficient=NUMBER) {
    
double c = toDouble($coefficient);
    
if (negate) c = -c; // applies sign to coefficient
    $polynomial::current.addTerm(new Term(c));
  }
  
| ^(TERM coefficient=NUMBER? variable=NAME exponent=NUMBER?) {
      
double c = coefficient == null ? 1.0 : toDouble($coefficient);
      
if (negate) c = -c; // applies sign to coefficient
      double exp = exponent == null ? 1.0 : toDouble($exponent);
      $polynomial::current.addTerm(
new Term(c, $variable.text, exp));
  }
  ;
// This outputs help on our language which is useful in interactive mode.
help
  : HELP {
     outln(
"In the help below");
     outln(
"* fn stands for function name");
     outln(
"* n stands for a number");
     outln(
"* v stands for variable");
     outln(
"");
     outln(
"To define");
     outln(
"* a variable: v = n");
     outln(
"* a function from a polynomial: fn(v) = polynomial-terms");
     outln(
"  (for example, f(x) = 3x^2 - 4x + 1)");
     outln(
"* a function from adding or subtracting two others: " +
     
"fn3 = fn1 +|- fn2");
     outln(
"  (for example, h = f + g)");
     outln(
"");
     outln(
"To print");
     outln(
"* a literal string: print \"text\"");
     outln(
"* a number: print n");
     outln(
"* the evaluation of a function: print fn(n | v)");
     outln(
"* the defintion of a function: print fn()");
     outln(
"* the derivative of a function: print fn'()");
     outln(
"* multiple items on the same line: print i1 i2  in");
     outln(
"");
     outln(
"To list");
     outln(
"* variables defined: list variables");
     outln(
"* functions defined: list functions");
     outln(
"");
     outln(
"To get help: help or ?");
     outln(
"");
     outln(
"To exit: exit or quit");
  }
  ;
// This lists all the functions or variables that are currently defined.
list
  : 
^(LIST FUNCTIONS) {
     outln(
"# of functions defined: " + functionMap.size());
     
for (Function function : functionMap.values()) {
       outln(function);
     }
  }
| ^(LIST VARIABLES) {
    outln(
"# of variables defined: " + variableMap.size());
    
for (String name : variableMap.keySet()) {
      
double value = variableMap.get(name);
      outln(name 
+ " = " + value);
    }
  }
  ;
// This adds or substracts two functions to create a new one.
combine
  : 
^(COMBINE fn1=NAME op=SIGN fn2=NAME fn3=NAME) {
      Function f2 
= getFunction(fn2);
      Function f3 
= getFunction(fn3);
      
if ("+".equals($op.text)) {
        
// "$fn1.text" is the name of the new function to create.
        define(f2.add($fn1.text, f3));
      } 
else if ("-".equals($op.text)) {
        define(f2.subtract($fn1.text, f3));
      } 
else {
        
// This should never happen since SIGN is defined to be either "+" or "-".
        throw new RuntimeException(
        
"The operator \"" + $op +
        " cannot be used for combining functions.");
      }
   }
  ;
// This prints a list of printTargets then prints a newline.
print
  : 
^(PRINT printTarget*)
  { System.out.println(); };
// This prints a single printTarget without a newline.
// "out", "unescape", "getVariable", "getFunction", "evalFunction"
// and "toDouble" are methods we wrote that were defined
// in the @members block earlier.
printTarget
  : NUMBER { out($NUMBER); }
  
| STRING_LITERAL {
      String s 
= unescape($STRING_LITERAL.text);
      out(s.substring(
1, s.length() - 1)); // removes quotes
    }
  
| NAME { out(getVariable($NAME)); }
  
| ^(FUNCTION NAME) { out(getFunction($NAME)); }
  
// The next line uses the return value named "result"
  
// from the earlier rule named "functionEval".
  | functionEval { out($functionEval.result); }
  
| derivative
  ;
// This prints the derivative of a function.
// This also could have been done in place in the printTarget rule.
derivative
  : 
^(DERIVATIVE NAME) {
    out(getFunction($NAME).getDerivative());
  }
  ;

Part VI - ANTLRWorks

ANTLRWorks is a graphical grammar editor and debugger. It checks for grammar errors, including those beyond the syntax variety such as conflicting rule alternatives, and highlights them. It can display a syntax diagram for a selected rule. It provides a debugger that can step through creation of parse trees and ASTs.

ANTLRWorks 是一个图形化的文法编辑器和调试器。它可以检查文法的错误,甚至不是的语法上的而是逻辑错误,比如互相冲突的规则分支,然后高亮它们。 它可以给选中的规则显示一个语法图。

Rectangles in syntax diagrams correspond to fixed vocabulary symbols. Rounded rectangles correspond to variable symbols.

语法图中的方框对应于确定的字符。圆角矩形对应于不定的字符。

Here's an example of a syntax diagram for a selected lexer rule.

下面是一个选中规则的语法图示例。

Lexer rule syntax diagram

Here's an example of a syntax diagram for a selected parser rule.

Parser rule syntax diagram

Here's an example of requesting a grammar check, followed by a successful result.

下面是检查文法的例子,文法是正确的。

ANTLRWorks check grammar request

ANTLRWorks check grammar result

Using the ANTLRWorks debugger is simple when the lexer and parser rules are combined in a single grammar file, unlike our example. Press the Debug toolbar button (with a bug on it), enter input text or select an input file, select the start rule (allows debugging a subset of the grammar) and press the OK button. Here's an example of entering the input for a different, simpler grammar that defines the lexer and parser rules in a single file:

当 lexer 和 parser 的规则合并症一个文法文件中时,使用 ANTLRWorks 调试器会非常简单。 不像我们的例子, 按下工具栏的调试按钮(上面有一个虫子的 ), 键入输入文本或选择一个文件, 选中开始规则 (也可以只是调试一些规则)然后点 OK 按钮。 下面的例子展示了给一个简单点的文法输入文本,其 lexer 和 parser 规则都在一个文件中:

ANTLRWorks debugger input

The debugger controls and output are displayed at the bottom of the ANTLRWorks window. Here's an example using that same, simpler grammar:

调试器的控件和输出都在 ANTLRWorks 窗口的底部, 下面是使用上一个简单文法的例子:

ANTLRWorks debugger output

Using the debugger when the lexer and parser rules are in separate files, like in our example, is a bit more complicated. See the ANTLR Wiki page titled "When do I need to use remote debugging."

像我们的例子这样, lexer 和 parser 规则在不同的文件中,使用调试器会稍微复杂一点。 请看 ANTLR Wiki 页,标题 "When do I need to use remote debugging"。


Part VII - Putting It All Together

Part VII - 组装所有部件!

终于翻到这里了。。

Using Generated Classes

使用生成的类

Next we need to write a class to utilize the classes generated by ANTLR. We'll call ours Processor. This class will use MathLexer (extends Lexer), MathParser (extends Parser) and MathTree (extends TreeParser). Note that the clases Lexer, Parser and TreeParser all extend the class BaseRecognizer. Our Processor class will also use other classes we wrote to model our domain. These classes are named Term, Function and Polynomial. We'll support two modes of operation, batch and interactive.

下面我们将要写一个类来调用 ANTLR 自动生成的类。 我们叫它 Processor 。 这个类会使用 MathLexer (继承自 Lexer), MathParser (继承自 Parser) 和 MathTree (继承自 TreeParser)。 注意,所有 Lexer , Parser 和 TreeParser 都继承自类 BaseRecognizer 。 我们的 Processor 类也会使用一些其它的类,那些用来组成我们整个架构的类。 这些类是 Term , Function 和 Polynomial 。 我们将会支持两种模式的操作,批处理模式和交互式。

Here's our Processor class.

下面是我们的 Processor 类。

package com.ociweb.math;
import java.io.*;
import java.util.Scanner;
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
public class Processor {
  public static void main(String[] args) throws IOException, RecognitionException {
    if (args.length == 0) {
      new Processor().processInteractive();
    } 
else if (args.length == 1) { // name of file to process was passed in
      new Processor().processFile(args[0]);
    } 
else { // more than one command-line argument
      System.err.println("usage: java com.ociweb.math.Processor [file-name]");
    }
  }
  private void processFile(String filePath) throws IOException, RecognitionException {
    CommonTree ast 
= getAST(new FileReader(filePath));
    //System.out.println(ast.toStringTree()); // for debugging
    processAST(ast);
  }
  private CommonTree getAST(Reader reader) throws IOException, RecognitionException {
    MathParser tokenParser 
= new MathParser(getTokenStream(reader));
    MathParser.script_return parserResult 
=
    tokenParser.script(); 
// start rule method
    reader.close();
    return (CommonTree) parserResult.getTree();
  }
  private CommonTokenStream getTokenStream(Reader reader) throws IOException {
    MathLexer lexer 
= new MathLexer(new ANTLRReaderStream(reader));
    return new CommonTokenStream(lexer);
  }
  private void processAST(CommonTree ast) throws RecognitionException {
    MathTree treeParser 
= new MathTree(new CommonTreeNodeStream(ast));
    treeParser.script(); 
// start rule method
  }
  private void processInteractive() throws IOException, RecognitionException {
    MathTree treeParser 
= new MathTree(null); // a TreeNodeStream will be assigned later
    Scanner scanner = new Scanner(System.in);
    while (true) {
      System.out.print(
"math> ");
      String line 
= scanner.nextLine().trim();
      if ("quit".equals(line) || "exit".equals(line)) break;
      processLine(treeParser, line);
    }
  }
  // Note that we can't create a new instance of MathTree for each
  // line processed because it maintains the variable and function Maps.
  private void processLine(MathTree treeParser, String line) throws RecognitionException {
  // Run the lexer and token parser on the line.
    MathLexer lexer = new MathLexer(new ANTLRStringStream(line));
    MathParser tokenParser 
= new MathParser(new CommonTokenStream(lexer));
    MathParser.statement_return parserResult 
=
    tokenParser.statement(); 
// start rule method
    // Use the token parser to retrieve the AST.
    CommonTree ast = (CommonTree) parserResult.getTree();
    if (ast == nullreturn// line is empty
    // Use the tree parser to process the AST.
    treeParser.setTreeNodeStream(new CommonTreeNodeStream(ast));
    treeParser.statement(); 
// start rule method
  }
// end of Processor class

Ant Tips

Ant 提示

Ant is a great tool for automating tasks used to develop and test grammars. Suggested independent "targets" include the following.

Ant 工具可以很好的帮助开发和调试文法的自动化任务。 支持下面每一个“目标”。

  • Use org.antlr.Tool to generate Java classes and ".tokens" files from each grammar file.
    • ".tokens" files assign integer constants to token names and are used by org.antlr.Tool when processing subsequent grammar files.
    • The "uptodate" task can be used to determine whether the grammar has changed since the last build.
    • The "unless" target attribute can be used to avoid running org.antlr.Tool if the grammar hasn't changed since the last build.
  • 使用 org.antlr.Tool 来从每一个文法文件中生成 Java 类和 ".tokens" 文件。
    • ".tokens" 给 token 绑定一个常量整数, 且 org.antlr.Tool 处理后续的文法文件时会用到它。
    • "uptodate" 任务可以用来确定文法在上次构建后是否有所改变。
    • "unless" 任务属性可以用来避免在上次构建后文法没有改变而 org.antlr.Tool 依然运行。
  • Compile Java source files.
  • 编译 Java 源文件。
  • Run automated tests.
  • 执行自动测试。
  • Run the application using a specific file as input.
  • 使用特定文件作为输入来运行程序。
  • Delete all generated files (clean target).
  • 删除所有自动生成的文件 (清理目标)。

For examples of all of these, download the source code from the URL listed at the end of this article and see the build.xml file.

比如,对于所有这些任务,可以从文章末尾列出的URL下载源文件然后查看 build.xml 文件。


Part VIII - Wrap Up

Hidden Tokens

By default the parser only processes tokens from the default channel. It can however request tokens from other channels such as the hidden channel. Tokens are assigned unique, sequential indexes regardless of the channel to which they are written. This allows parser code to determine the order in which the tokens were encountered, regardless of the channel to which they were written.

解析器默认只处理默认通道里的 token 。 它也可以从其它通道比如隐藏通道,里请求 token 。 Token 会被按连续顺序,不管它将要被写入哪个通道,赋给一个唯一的索引值。 这样可以允许解析器确定 token 被发现的顺序(不管他们要被写入哪个通道)。

Here are some related public constants and methods from the Token class.

下面是一些 Token 类中相关的公有常量和函数。

  • static final int DEFAULT_CHANNEL
  • static final int HIDDEN_CHANNEL
  • int getChannel() This gets the number of the channel where this Token was written. 取得此 Token 要被写入的通道数
  • int getTokenIndex() This gets the index of this Token. 取得当前 Token 的索引

Here are some related public methods from the CommonTokenStream class, which implements the TokenStream interface.

下面是一些 CommonTokenStream 类相关的公有函数,它实现了 TokenStream 接口。

  • Token get(int index) This gets the Token found at a given position in the input. 取得输入中 index 位置的 Token
  • List getTokens(int start, int stop) This gets a List of Tokens found between given positions in the input. 取得输入中指定一个区间的一串 Token
  • int index() This gets the index of the last Token that was read. 取得最近一次读取的 Token 的index

Advanced Topics

We have demonstrated the basics of using ANTLR. For information on advanced topics, see the slides from the presentation on which this article was based at http://www.ociweb.com/mark/programming/ANTLR3.html. This web page contains links to the slides and the code presented in this article. The advanced topics covered in these slides include the following.

我们已经展示了 ANTLR 基本使用方法。 一些高阶内容方面的信息,可以看本文所基于内容的幻灯片,地址: http://www.ociweb.com/mark/programming/ANTLR3.html 。 这个页面包含了幻灯片的链接和本文展示的代码。 这些幻灯片涵盖了下面几个高阶部分

  • remote debugging
  • using the StringTemplate library
  • details on the use of lookahead in grammars
  • three kinds semantic predicates: validating, gated and disambiguating
  • syntactic predicates
  • customizing error handling
  • gUnit grammar unit testing framework
  • 远程调试
  • 使用 StringTemplate 库
  • 在文法中使用前看的详细信息
  • 三种语义断言: validating, gated 和 disambiguating
  • 语法断言
  • 自定义错误处理
  • gUnit grammar 单元测试框架

Projects Using ANTLR

Many programming languages have been implemented using ANTLR. These include Boo, Groovy, Mantra, Nemerle and XRuby.

很都编程语言都被用 ANTLR 重新实现了。 包括 Boo, Groovy, Mantra, Nemerle 和 XRuby 。

Many other kinds of tools use ANTLR in their implementation. These include Hibernate (for its HQL to SQL query translator), Intellij IDEA, Jazillian (translates COBOL, C and C++ to Java), JBoss Rules (was Drools), Keynote (from Apple), WebLogic (from Oracle), and many more.

很多其它类的工具也使用了 ANTLR 。 包括 Hibernate ( HQL 到 SQL 查询转换器部分), Intellij IDEA, Jazillian (把 COBOL, C 和 C++ 翻译成 Java), JBoss Rules (曾名 Drools), Keynote (源自 Apple), WebLogic (源自 Oracle), 等等等等。

Books

Currrently only one book on ANTLR is available. Terence Parr, creator of ANTLR, wrote "The Definitive ANTLR Reference" It is published by "The Pragmatic Programmers." Terence is working on a second book for the same publisher that may be titled "ANTLR Recipes."

当前, ANTLR 书只有一本。(不知道现在咋样了。。 ) Terence Parr, ANTLR 作者,编写的 "The Definitive ANTLR Reference" 《ANTLR 权威参考手册?》由 "The Pragmatic Programmers" 出版。 Terence 正在致力于帮该出版社写另一本可能会命名为 "ANTLR 秘诀"。

Summary

There you have it! ANTLR is a great tool for generating custom language parsers. We hope this article will make it easier to get started creating validators, processors and translators.

你已成佛! ANTLR 是一个非常好的生成特制语言解析器的工具。 我们希望本文让开始创建 验证器,处理器和翻译器 的过程变简单些了。

References

posted on 2009-12-07 00:13 Morya 阅读(3708) 评论(0)  编辑 收藏 引用

导航

<2009年7月>
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678

统计

常用链接

留言簿(1)

随笔档案(21)

文章档案(1)

最新评论

评论排行榜