0%

CMPT379编译器Lab0:Decaf语法规则

Decaf 是一种强类型的类 C 语言,功能集与通常作为成熟编程语言的一部分相比大幅缩减(这样做是为了保持编程分配易于管理)

下面是一个 Decaf 的案例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
extern func print_int(int) void;

package GreatestCommonDivisor {
var a int = 10;
var b int = 20;

func main() int {
var x, y, z int;
x = a;
y = b;
z = gcd(x, y);

// print_int is part of the standard input-output library
print_int(z);
}

// function that computes the greatest common divisor
func gcd(a int, b int) int {
if (b == 0) { return(a); }
else { return( gcd(b, a % b) ); }
}
}

语法是使用 EBNF(扩展巴科斯范式)指定的:

1
2
3
4
5
6
7
8
9
Production  = production_name "=" [ Expression ] "." . /* 定义 */
Expression = Alternative { "|" Alternative } . /* 表达 */
Alternative = Term { Term } . /* 选择 */
Term = production_name | token [ "..." token ] | Group | Option | Repetition | Repetition+ | CommaList . /* 术语 */
Group = "(" Expression ")" . /* 分组 */
Option = "[" Expression "]" . /* 可选 */
Repetition = "{" Expression "}" . /* 重复 */
Repetition+ = "{" Expression "}+" .
CommaList = "{" Expression "}+," .

Production 是由 Term 和以下运算符构造的表达式,优先级越来越高:

1
2
3
4
5
6
|    alternation
() grouping
[] option (0 or 1 Expression)
{} repetition (0 to n Expressions)
{}+ repetition (1 to n Expressions)
{}+, comma list (1 to n Expressions comma separated, e.g. x, y, z)

Decaf 源代码编码为 ASCII 文本,大写和小写字符被视为不同的字符

1
2
3
4
letter        = "A" ... "Z" | "a" ... "z" | "_" . /* 字母 */
decimal_digit = "0" ... "9" . /* 10进制数字 */
hex_digit = "0" ... "9" | "A" ... "F" | "a" ... "f" . /* 16进制数字 */
digit = "0" ... "9" . /* 数字 */

以下关键字是保留的,不能用作标识符:

1
2
3
bool    break   continue  else   extern  false   
for func if int null package
return string true var void while

以下字符序列表示运算符(请参阅下面的运算符部分)和分隔符:

1
2
3
{  }   [   ]   ,   ;   (   )  =  
- ! + * / << >> < >
% <= >= == != && || .

Decaf 有四种类型:void,布尔,整数和字符串

  • 字符串类型只能与外部函数一起使用(这里的 “外部函数” 指的是程序内定义的函数,或者用 C 实现的 Decaf 标准库函数)
  • void 类型仅用于函数的返回类型 MethodType,不用于变量声明 Type
1
2
3
ExternType = ( string | Type ) .
Type = ( int | bool ) .
MethodType = ( void | Type ) .

Decaf 有2种数组:整数,布尔数组

1
ArrayType = "[" int_lit "]" Type .

Decaf 程序可以访问链接的外部函数(例如:用 C 实现的 Decaf 标准库函数),以及作为外部函数从 Decaf 程序内部访问的函数(目前,只允许声明外部函数,无法声明外部数据)

1
2
Externs    = { ExternDefn } .
ExternDefn = extern func identifier "(" [ { ExternType }+, ] ")" MethodType ";" .

Decaf 具有全局变量,其范围仅限于其包,出现在任何方法声明之前

Decaf 中的全局变量称为字段声明,它们可以是没有初始化的简单声明(假定编译器初始化为零),也可以使用常量赋值来声明非数组变量,变量始终使用保留字 var 定义

1
2
3
4
FieldDecls = { FieldDecl } .
FieldDecl = var { identifier }+, Type ";" . /* 简单变量(未初始化) */
FieldDecl = var { identifier }+, ArrayType ";" . /* 数组变量 */
FieldDecl = var identifier Type "=" Constant ";" . /* 简单变量(初始化) */

Decaf 中的函数或方法以保留字 func 开头,然后是方法的名称,括号中是参数列表,后跟方法的返回类型

1
2
MethodDecls = { MethodDecl } .
MethodDecl = func identifier "(" [ { identifier Type }+, ] ")" MethodType Block .

Decaf 的块有两个部分,一个用于局部变量定义,一个用于语句

1
Block = "{" VarDecls Statements "}" .

Decaf 的语句有如下几种:

  • 赋值语句
1
2
3
Statement = Assign ";" .
Assign = Lvalue "=" Expr .
Lvalue = identifier | identifier "[" Expr "]" . /* 变量 | 数组[表达式] */
  • 函数调用
1
2
3
Statement  = MethodCall ";" .
MethodCall = identifier "(" [ { MethodArg }+, ] ")" .
MethodArg = Expr | string_lit .
  • IF-ELSE 语句
1
Statement = if "(" Expr ")" Block [ else Block ] .
  • WHILE 语句
1
Statement =  while "(" Expr ")" Block .
  • FOR 语句
1
Statement = for "(" { Assign }+, ";" Expr ";" { Assign }+, ")" Block .
  • RETURN 语句
1
Statement = return [ "(" [ Expr ] ")" ] ";" .
  • BREAK 语句
1
Statement = break ";" .
  • CONTINUE 语句
1
Statement = continue ";" .

Decaf 表达式中的基本值是操作数,也可以是各种表达式的组合

1
2
3
Expr = identifier .
Expr = MethodCall .
Expr = Constant .
1
2
3
4
5
6
7
BinaryOperator = ( ArithmeticOperator | BooleanOperator ) .
ArithmeticOperator = ( "+" | "-" | "*" | "/" | "<<" | ">>" | "%" ) .
BooleanOperator = ( "==" | "!=" | "<" | "<=" | ">" | ">=" | "&&" | "||" ) .

Expr = Expr BinaryOperator Expr . /* 二元运算符 */
Expr = UnaryOperator Expr . /* 一元运算符 */
Expr = "(" Expr ")" .

Decaf 中只有两个一元运算符:一个用于逻辑否定,另一个用于算术表达式的负数

1
2
3
UnaryOperator = ( UnaryNot | UnaryMinus ) .
UnaryNot = "!" .
UnaryMinus = "-" .

Decaf 二元运算符分为布尔二元运算符和算术二元运算符

1
2
3
BinaryOperator = ( ArithmeticOperator | BooleanOperator ) .
ArithmeticOperator = ( "+" | "-" | "*" | "/" | "<<" | ">>" | "%" ) .
BooleanOperator = ( "==" | "!=" | "<" | "<=" | ">" | ">=" | "&&" | "||" ) .

一元运算符 UnaryMinus 具有最高优先级,优先级定义如下:

  • UnaryMinus
  • UnaryNot
  • * / % << >>
  • + -
  • == != < <= > >=
  • &&
  • ||

Decaf 的 token 表列如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
T_AND            &&
T_ASSIGN =
T_BOOLTYPE bool
T_BREAK break
T_CHARCONSTANT char_lit (see section on Character literals)
T_COMMA ,
T_COMMENT comment
T_CONTINUE continue
T_DIV /
T_DOT .
T_ELSE else
T_EQ ==
T_EXTERN extern
T_FALSE false
T_FOR for
T_FUNC func
T_GEQ >=
T_GT >
T_ID identifier (see section on Identifiers)
T_IF if
T_INTCONSTANT int_lit (see section on Integer literals)
T_INTTYPE int
T_LCB {
T_LEFTSHIFT <<
T_LEQ <=
T_LPAREN (
T_LSB [
T_LT <
T_MINUS -
T_MOD %
T_MULT *
T_NEQ !=
T_NOT !
T_NULL null
T_OR ||
T_PACKAGE package
T_PLUS +
T_RCB }
T_RETURN return
T_RIGHTSHIFT >>
T_RPAREN )
T_RSB ]
T_SEMICOLON ;
T_STRINGCONSTANT string_lit (see section on String literals)
T_STRINGTYPE string
T_TRUE true
T_VAR var
T_VOID void
T_WHILE while
T_WHITESPACE whitespace (see section on Whitespace)

Decaf 的规则表如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Program = Externs package identifier "{" FieldDecls MethodDecls "}" .
Externs = { ExternDefn } .
ExternDefn = extern func identifier "(" [ { ExternType }+, ] ")" MethodType ";" .
FieldDecls = { FieldDecl } .
FieldDecl = var { identifier }+, Type ";" .
FieldDecl = var { identifier }+, ArrayType ";" .
FieldDecl = var identifier Type "=" Constant ";" .
MethodDecls = { MethodDecl } .
MethodDecl = func identifier "(" [ { identifier Type }+, ] ")" MethodType Block .
Block = "{" VarDecls Statements "}" .
VarDecls = { VarDecl } .
VarDecl = var { identifier }+, Type ";" .
Statements = { Statement } .
Statement = Block .
Statement = Assign ";" .
Assign = Lvalue "=" Expr .
Lvalue = identifier | identifier "[" Expr "]" .
Statement = MethodCall ";" .
MethodCall = identifier "(" [ { MethodArg }+, ] ")" .
MethodArg = Expr | string_lit .
Statement = if "(" Expr ")" Block [ else Block ] .
Statement = while "(" Expr ")" Block .
Statement = for "(" { Assign }+, ";" Expr ";" { Assign }+, ")" Block .
Statement = return [ "(" [ Expr ] ")" ] ";" .
Statement = break ";" .
Statement = continue ";" .
Expr = identifier .
Expr = MethodCall .
Expr = Constant .
UnaryOperator = ( UnaryNot | UnaryMinus ) .
UnaryNot = "!" .
UnaryMinus = "-" .
BinaryOperator = ( ArithmeticOperator | BooleanOperator ) .
ArithmeticOperator = ( "+" | "-" | "*" | "/" | "<<" | ">>" | "%" ) .
BooleanOperator = ( "==" | "!=" | "<" | "<=" | ">" | ">=" | "&&" | "||" ) .
Expr = Expr BinaryOperator Expr .
Expr = UnaryOperator Expr .
Expr = "(" Expr ")" .
Expr = identifier "[" Expr "]" .
ExternType = ( string | Type ) .
Type = ( int | bool ) .
MethodType = ( void | Type ) .
BoolConstant = ( true | false ) .
ArrayType = "[" int_lit "]" Type .
Constant = ( int_lit | char_lit | BoolConstant ) .

确保在 Decaf 编译器中实现以下类型检查:

  • 二元和一元 + - * / % >> << < > <= >= 仅适用于整数表达式
  • 二元和一元 && || ! 仅适用于布尔表达式
  • 二元 == != 适用于任何类型,但两个操作数必须具有相同的类型
  • 对函数参数的赋值是有效的,应该像局部变量一样更改值
  • && || 和运算符短路(这已经在规范中指定)
  • 如果一个块中有多个 return 语句,则只使用第一个语句,但其他语句仍应进行类型检查
  • 为标量编制索引是一个语义错误:{ var x int; x[0] = 1;}
  • 使用布尔值编制索引是一个语义错误:func main() int { var x int; x = xs[true];}
  • 对循环条件使用非布尔表达式是一个语义错误:{ while (1) {} }
  • 在 IF 语句条件中使用非布尔表达式是语义错误:{ if (0) {} }
  • 在具有 void 返回类型的函数中不允许使用带有表达式的 return 语句:func foo() void { return (1); }
  • 在非 void 函数中没有表达式的 return 语句会生成默认返回值
  • 不能在表达式中使用 void 函数:func foo() void {} func main() int { if (foo()) {} }
  • 不能调用参数数错误的函数
  • 查找变量类型的定义与分配给该变量的值之间存在类型不匹配的所有情况
  • 查找表达式格式正确的所有情况,其中二进制和一元运算符与关系运算符和相等运算符区分开来
  • 在将变量用作 decaf 程序中的左值或右值之前,请检查所有变量是否在正确的范围内定义
  • 检查方法中的 return 语句是否与方法定义中的返回类型匹配:func foo() bool{return(10);}