chapter3-finite automata

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

E
digit start 20 digit * 21 . 22 digit
digit
digit 23 other 24 *
digit start 25 digit 26 other 27 *
return(num, install_num())
The lexeme for a given token must be the longest possible. Questions: Is ordering important for unsigned #s ? “greed” Why are there no TDs for then, else, if ? (the same as id)
if
then begin
...
257
258 259
...
• When a match is found, the token is returned, along with its symbolic value, i.e., “then”, 258 • If a match is not found, then it is assumed that an id has been discovered
if then else id num relop relop relop relop relop relop
pointer to table entry value pointer to table entry LT LE EQ NE GT GE
Fig.3.10 Regular-expression patterns for tokens. (P.99)
3
other = 4
return(relop, NE)
*
return(relop, LT)
5
> 6
return(relop, EQ)
= other
7
return(relop, GE)
8
*
return(relop, GT)
Fig.3.12 Transition diagram for relational operators.
Compilers: Principles, Techniques, and Tools
What Else Does Lexical Analyzer Do? (P.102)
All Keywords / Reserved words are matched as ids
• After the match, the symbol table or a special keyword table is consulted • Keyword table contains string versions of all keywords and associated token values
◎2005 ECNU SEI
1
What Else Does Lexical Analyzer Do? (P.99)
Scan away b, nl, tabs
Can we Define Tokens For These?
Character Escape Sequence
blank tab newline delim ws
◎2005 ECNU SEI
Compilers: Principles, Techniques, and Tools
8
Example 3.9 : Unsigned #s (P.102)
digit start 12 digit 13 . 14 digit digit 15 E 16 +|17 digit digit 18 other 19 *
◎2005 ECNU SEI
Regular Expression
Token
Attribute-Value
Note:
Each token has a unique token identifier to define category of lexemes
ws if then else id num < <= = <> > >=
Fig.3.14 Transition diagram for unsigned numbers in Pascal.
◎2005 ECNU SEI Compilers: Principles, Techniques, and Tools 9
QUESTION :
What would the transition diagram (TD) for strings containing each vowel, in their strict lexicographical order, look like ?
◎2005 ECNU SEI Compilers: Principles, Techniques, and Tools 3
Constructing Transition Diagrams for Tokens (P.99)
• Transition Diagrams (TD) are used to represent the tokens • As characters are read, the relevant TDs are used to attempt to match lexeme to a pattern • Each TD has:
" \“ ' \‘ b? \? ^T \ \\ BEL \a ^M BS \b blank| FF tab | newline \f NL + \n delim (ws is whitespace) CR \r HT \t VT \v
Compilers: Principles, Techniques, and Tools 2
Fig.3.11 Transition diagram for >=.
◎2005 ECNU SEI Compilers: Principles, Techniques, and Tools 5
Example 3.7: All RELOPs (P.101)
start 0 < 1 = > 2
return(relop, LE)
s O
cons U
cons other accept
error
Note: The error path is taken if the character is other than a cons or the vowel in the lex order.
11
◎2005 ECNU SEI
◎2005 ECNU SEI Compilers: Principles, Techniques, and Tools 12
Implementing Transition Diagrams (P.104)
lexeme_beginning = forward; state = 0;
FUNCTIONS USED
• Placing keywords in the symbol table is almost essential and is coded by hand, or placing keywords in other table called keywords/reserved-words table.
◎2005 ECNU SEI Compilers: Principles, Techniques, and Tools 6
Example 3.8, 3.10 : id and delim (P.101)
delim :
delim
start 28 delim 29 other 30 *
id :
start 9 letter
letter or digit 10 other 11 *
return( get_token(), install_id())
Either returns ptr or “0” if reserved
Fig.3.13 Transition diagram identifiers and keywords.
◎2005 ECNU SEI Compilers: Principles, Techniques, and Tools 7
• When a token is recognized, one of the following must be done: How to
describe – If keywords: return 0 – If ID in symbol table: return entry of symbol table level ? – If ID not in symbol table: install id and return the new entry of symbol table
States : Represented by Circles
Actions : Represented by Arrows between states Start State : Beginning of a pattern (Arrowhead) Final State(s) : End of pattern (Concentric Circles) • Each TD is Deterministic - No need to choose between 2 different actions !
◎2005 ECNU SEI Compilers: Principles, Techniques, and Tools 4
Example TDs (P.100)
>=:
start 0 > 6 = other 8 7
RTN(GE)
* RTN(G)
* means: We’ve accepted “>” and have read other char that must be unread.
◎2005 ECNU SEI
Compilers: Principles, Techniques, and Tools
10
Answer
cons B | C | D | F | … | Z
string cons* A cons* E cons* I cons* O cons* U cons*
cons start A
nextchar(), forward(), retract(), install_num(), install_id(), gettoken(), isdigit(), isletter(), recover()
token nexttoken() { while(1) { switch (state) { case 0: c = nextchar(); /* c is lookahead character */ repeat if (c== blank || c==tab || c== newline) { until state = 0; start < 0 a “return” lexeme_beginning++; occurs /* advance beginning of lexeme */ } = else if (c == ‘<‘) state = 1; else if (c == ‘=‘) state = 5; else if (c == ‘>’) state = 6; > else state = fail(); break; … /* cases 1-8 here */
3.4 Token Recognition (P.98)
How can we use concepts developed so far to assist in recognizing tokens of a source language ?
Assume Following Tokens: if, then, else, relop, id, num What language construct are they used for ? Given Tokens, What are Patterns ? if if then then else else relop < | <= | > | >= | = | <> id letter ( letter | digit )* num digit + (. digit + ) ? ( E(+ | -) ? digit + ) ? What does this represent ? What is ? Compilers: Principles, Techniques, and Tools Example 3.6 Grammar: stmt |if expr then stmt |if expr then stmt else stmt | expr term relop term | term term id | num
相关文档
最新文档