flexfastlexicalanalyzergenerator:灵活快速的词法分析器生成器
合集下载
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
• A pattern is an extended regular expression; an action is an arbitrary C statement.
¾ If the action is empty, then when the pattern is matched, the input token is simply discarded.
• Using Cygwin tools on PC:
W2K% flex count.flex W2K% gcc lex.yy.c -lfl W2K% ./a.exe < count.flex # of lines= 12, # of characters= 250
CS780(Prasad)
L5Flex
Rule 1 Rule 2 Rule 3 Rule 4 Rule 5
L5Flex
9
Output of “self-scan”
%x comment %%
int line_num = 1;
" /*" <comment>[^*\n]* <comment>"*"+[^*/\n]* <comment>\n <comment>"*"+"/" %%
"/*" <comment>[^*\n]* <comment>"*"+[^*/\n]* <comment>\n <comment>"*"+"/" %%
CS780(Prasad)
BEGIN(comment); /* eat anything that's not a '*' */ /* eat up '*'s not followed by '/'s */ ++line_num; BEGIN(INITIAL);
CS780(Prasad)
L5Flex
3
Definitions
• C Code ¾ include files ¾ global variables
• Regular names defined • Start Conditions defined (exclusive states, inclusive states)
%{ #include <stdio.h> %}
DIGIT [0-9]
ID
[a-zA-Z][a-zA-Z0-9_]*
%x INCOMMENT
CS780(Prasad)
L5Flex
4
1
Rules
• This section contains a list of pairs of the form: pattern action
++line_num; BEGIN(INITIAL);
L5Flex
11
printRulesStr.flex
%x comment
%%
int line_num = 1;
printf(" INITIAL: Default ");
"/*"
{BEGIN(comment);
printf(", R 1 : |%s|, COMMENT : ", yytext); }
where the pattern must be unindented and the action must begin on the same line. The pattern ends at the first non-escaped whitespace character; the remainder of the line is its action.
%x comment %%
int line_num = 1;
"/*" <comment>[^ *\n] * <comment>"*"+[^*/\n]*
<comment>\n <comment>"*"+"/" %%
CS780(Prasad)
BEGIN(comment);
/* eat anything that's not a '*' */ /* eat up '*'s not followed by '/'s */
BEGIN(comment); /* eat anything that's not a '*' */ /* eat up '*'s not followed by '/'s */ ++line_num; BEGIN(INITIAL);
CS780(Prasad)
L5Flex
10
“Self-scanning” comment.flex
<comment>[^*\n]*
printf(", R 2 : |%s|", yytext);
<comment>"*"+[^*/\n]* printf(", R 3 : |%s|", yytext);
<comment>\n
printf(", R 4 : |%s| \n, COMMENT: ", yytext);
<STRING>[^"]* { /* eat up the string body ... */ …}
CS780(Prasad)
L5Flex
6
A Simple Example
int num_lines = 0, num_chars = 0;
%% \n ++num_lines; ++num_chars; • ++num_chars;
2
Flex input file format
• The flex input file consists of three sections, separated by a line with just `%%' in it: definitions %% rules %% user code
• Simple Example %% username printf( "%s", getlogin() );
• Start State ¾ Mechanism for conditionally activating rules.
Any rule whose pattern is prefixed with "<sc>" will only be active when the scanner is in the start condition named "sc".
¾ Scanner called as a subroutine when parser needs the next token.inFra bibliotekut.flex
(flex format input file)
Flex
lex.yy.c (yylex() routine)
CS780(Prasad)
L5Flex
ytab.h (header file definitions for tokens and types for token attributes)
FLEX Fast Lexical Analyzer Generator
Adapted from material in: Gnu Manual for Flex by Vern Paxson
CS780(Prasad)
L5Flex
1
Overview of Flex
• Scanner generator • Interface with Parser
<comment>"*"+"/"
{BEGIN(INITIAL);
printf(", R 5 : |%s|, INITIAL : Default ", yytext);}
.
\n
printf("\n INITIAL : Default ");
%%
CS780(Prasad)
L5Flex
12
3
8
2
Start State Example
Here is a scanner which recognizes (and discards) C comments while maintaining a count of the current input line.
%x comment %%
int line_num = 1;
%%
main() {
yylex();
printf( "# of lines = %d, # of chars = %d\n",
num_lines,
num_chars );
}
CS780(Prasad)
L5Flex
7
Generating Scanner
UNIX% flex count.flex UNIX% gcc lex.yy.c -lfl UNIX% a.out < count.flex # of lines= 12, # of characters= 250
¾ If an input character matches no pattern, then the scanner writes a copy of the token to the output.
CS780(Prasad)
L5Flex
5
Auxiliary Routines
• The user code section is simply copied to `lex.yy.c' verbatim. It is used as companion routines which call or are called by the scanner. The presence of this section is optional; if it is missing, the second `%%' in the input file may be skipped too.
¾ If the action is empty, then when the pattern is matched, the input token is simply discarded.
• Using Cygwin tools on PC:
W2K% flex count.flex W2K% gcc lex.yy.c -lfl W2K% ./a.exe < count.flex # of lines= 12, # of characters= 250
CS780(Prasad)
L5Flex
Rule 1 Rule 2 Rule 3 Rule 4 Rule 5
L5Flex
9
Output of “self-scan”
%x comment %%
int line_num = 1;
" /*" <comment>[^*\n]* <comment>"*"+[^*/\n]* <comment>\n <comment>"*"+"/" %%
"/*" <comment>[^*\n]* <comment>"*"+[^*/\n]* <comment>\n <comment>"*"+"/" %%
CS780(Prasad)
BEGIN(comment); /* eat anything that's not a '*' */ /* eat up '*'s not followed by '/'s */ ++line_num; BEGIN(INITIAL);
CS780(Prasad)
L5Flex
3
Definitions
• C Code ¾ include files ¾ global variables
• Regular names defined • Start Conditions defined (exclusive states, inclusive states)
%{ #include <stdio.h> %}
DIGIT [0-9]
ID
[a-zA-Z][a-zA-Z0-9_]*
%x INCOMMENT
CS780(Prasad)
L5Flex
4
1
Rules
• This section contains a list of pairs of the form: pattern action
++line_num; BEGIN(INITIAL);
L5Flex
11
printRulesStr.flex
%x comment
%%
int line_num = 1;
printf(" INITIAL: Default ");
"/*"
{BEGIN(comment);
printf(", R 1 : |%s|, COMMENT : ", yytext); }
where the pattern must be unindented and the action must begin on the same line. The pattern ends at the first non-escaped whitespace character; the remainder of the line is its action.
%x comment %%
int line_num = 1;
"/*" <comment>[^ *\n] * <comment>"*"+[^*/\n]*
<comment>\n <comment>"*"+"/" %%
CS780(Prasad)
BEGIN(comment);
/* eat anything that's not a '*' */ /* eat up '*'s not followed by '/'s */
BEGIN(comment); /* eat anything that's not a '*' */ /* eat up '*'s not followed by '/'s */ ++line_num; BEGIN(INITIAL);
CS780(Prasad)
L5Flex
10
“Self-scanning” comment.flex
<comment>[^*\n]*
printf(", R 2 : |%s|", yytext);
<comment>"*"+[^*/\n]* printf(", R 3 : |%s|", yytext);
<comment>\n
printf(", R 4 : |%s| \n, COMMENT: ", yytext);
<STRING>[^"]* { /* eat up the string body ... */ …}
CS780(Prasad)
L5Flex
6
A Simple Example
int num_lines = 0, num_chars = 0;
%% \n ++num_lines; ++num_chars; • ++num_chars;
2
Flex input file format
• The flex input file consists of three sections, separated by a line with just `%%' in it: definitions %% rules %% user code
• Simple Example %% username printf( "%s", getlogin() );
• Start State ¾ Mechanism for conditionally activating rules.
Any rule whose pattern is prefixed with "<sc>" will only be active when the scanner is in the start condition named "sc".
¾ Scanner called as a subroutine when parser needs the next token.inFra bibliotekut.flex
(flex format input file)
Flex
lex.yy.c (yylex() routine)
CS780(Prasad)
L5Flex
ytab.h (header file definitions for tokens and types for token attributes)
FLEX Fast Lexical Analyzer Generator
Adapted from material in: Gnu Manual for Flex by Vern Paxson
CS780(Prasad)
L5Flex
1
Overview of Flex
• Scanner generator • Interface with Parser
<comment>"*"+"/"
{BEGIN(INITIAL);
printf(", R 5 : |%s|, INITIAL : Default ", yytext);}
.
\n
printf("\n INITIAL : Default ");
%%
CS780(Prasad)
L5Flex
12
3
8
2
Start State Example
Here is a scanner which recognizes (and discards) C comments while maintaining a count of the current input line.
%x comment %%
int line_num = 1;
%%
main() {
yylex();
printf( "# of lines = %d, # of chars = %d\n",
num_lines,
num_chars );
}
CS780(Prasad)
L5Flex
7
Generating Scanner
UNIX% flex count.flex UNIX% gcc lex.yy.c -lfl UNIX% a.out < count.flex # of lines= 12, # of characters= 250
¾ If an input character matches no pattern, then the scanner writes a copy of the token to the output.
CS780(Prasad)
L5Flex
5
Auxiliary Routines
• The user code section is simply copied to `lex.yy.c' verbatim. It is used as companion routines which call or are called by the scanner. The presence of this section is optional; if it is missing, the second `%%' in the input file may be skipped too.