Real Scanner
A. First Edition
This is first edition of my real scanner which is just what the caption implies, a practical scanner. It is
my headache.
Design a lexical analyzer for the tokens of the programming language AGB.
The tokens of AGB are the following.
Identifiers, numbers, strings:
identifier
= letter(letter|digit)*number
= digit(digit)*string
= (~")|(~)*with
letter = A - Z, digit = 0 - 9 .Strings are delimited by
" or characters. Any ASCII character may appearin a string, including
{ and }, but excluding " (in strings defined by " ~' ")or (in strings de ned by
).Single-character symbols:
+ - * / ^ ! , ; < > = # ( ) [ ]
Multiple-character symbols:
:= :=: <= >= ,...,
Reserved words:
AND BOOLEAN ELSE FOR MOD PROCEDURE TRUE
ARRAY DIVIDES END IF NOT REM VALUE
BEGIN DO FALSE INTEGER OR THEN WHILE
Bookkeeping:
bof eof error
There is a pre-test sample scanner I wrote a couple of week ago. Almost all my idea is based on that.
I strictly follow the idea of DFA, simply translate all transition function into C++ functions. As
for OO, I don't really need it. You see, I am such a fanatic of function arrays. In order to use it
I have to define a series of global functions to put them in the array of function pointers. Then
my major function "next" will call them repeatedly and return value is index of next state or
function. There is another painful thing is the feof of "FILE" which I highly suspected as a bug.
I give up soon when I encounter it again and switch to fstream.
In the fstream, I try the function "get" and it cannot recognize "new line" character, so I have to
add "10" and "13" character to be one of "white space". It is totally insane!
I also add a strange feature that a nested comment, can you imagine what real usage of it? I doubt
it. It is simply a kind of stack.
Sometimes I just wonder if there will be somebody really feels interested and reads these stuff. So,
should I write all these for my own purpose? I guess so, and sometimes my wild imagination even
suggests that this can be a e-record which might be discovered by future archaeologists who are
themselves huge computer systems. These computers feel rather curious about how they are invented and
evolved. Except digging in piles of books, they might also search in all those hard disks in various
internet servers. By any chance, one might find all these words and realizes that even as early as
beginning of 21st century a stupid guy has anticipated what happened hundreds of years later, or
maybe thousands of years later.
I always consider compiler is one of greatest achievement of history of computer evolution. It is
the communicating tools between computer and human. A more sophisticated communicating channel should
be established between human and machine. I am willing to devote my whole life to be such an agent or
interpreter or whatever you name it. This is really what I mean by saying my journey is oceans of
stars.
However, this pilgrim is such a grim road that I often have the heart-broken feeling of the hopeless
situations. Cause what if I am not the ONE? I understand exactly what Neo feels when Morphius told
him that he is the saviour of the world in Matrix. Except that I try to fool myself by convincing
myself I should be the one. I dare not imagine what if I found out the truth finally. Or will I
care about it? I mean life is simply a procedure. No matter what happened inside the procedure, there
will be an end similar to each other. Should I care about it?
The day before yesterday, I watched the film of Korean titled <My barbarian girlfriend>. It is a
kind of heart-touching film which uncovers a lot of time-washed scenes in last century. Some time
when my blood is still hot to burst out of veins. If I was given the choices, I would rather occupy
myself with more coding to forget about those chaos. Yesterday I attended a party in home of
landlord. It is the similar feeling in my heart. The more exciting the surrounding is, the more
lonely I will feel. It is even the heart-chilling lonely which I would rather look for shelter in
machines' languages.
E.Further improvement
F.File listing
1. scanner.h
2. scanner.cpp
3. tiny.cpp (main)
file name: scanner.h
#ifndef SCANNER_H #define SCANNER_H #include <iostream> #include <fstream> using namespace std; const int MaxTokenLength=1024; const int StateNumber=22; const int SingleCount=16; const int MultiCount=5; const int LetterCount=26; const int DigitCount=10; const int ReservedCount=21; extern char buffer[MaxTokenLength+1];//in order to display as string enum TokenKind {IDKind, NumberKind, StringKind, SingleKind, MultiKind, ReservedKind, CommentKind, ErrorKind, eof, bof}; enum StateKind {Ready, SingleSymbol, BothSymbol, InsideMulti1, InsideMulti2, InsideMulti3, BothEqual, BothPeriod1, BothPeriod2, BothPeriod3, BothPeriod4, StartID, StartNumber, StartComment, StartSingleStr, StartDoubleStr, InsideSingleStr, InsideDoubleStr, EndSingleQuote, EndDoubleQuote, InsideComment, EndBracket, EndSingle, EndMulti, EndID, EndNumber, EndComment, EndDoubleStr, EndSingleStr, Error}; enum CharSet { SingleBegins, BothBegins, MultiBegins, WhiteSpace, Letters, Digits, CommentBegins, SingleQuote, DoubleQuote, Others}; class Scanner { private: char lastChar; ifstream stream; bool readChar(char& ch); bool isFinal(StateKind state, TokenKind& kind); bool isReserved(); public: bool openFile(const char* fileName); TokenKind next(); char* output() { return buffer;} }; #endif
file name: scanner.cpp
#include <iostream> #include "scanner.h" using namespace std; int counter=0; char buffer[MaxTokenLength+1]; char* tokenName[8]={"identifier", "number", "string", "single-symbol", "multi-symbol", "reserved", "comment", "error"}; char single[SingleCount]={'+', '-', '*', '/', '^', '!', ',', ';', '<', '>', '=', '#', '(', ')', '[', ']'}; char multiBeginning[MultiCount-1]={':', '<', '>', ','}; char* reserved[ReservedCount]={"AND", "BOOLEAN", "ELSE", "FOR", "MOD", "PROCEDURE", "TRUE", "ARRAY", "DIVIDES", "END", "IF", "NOT", "REM", "VALUE", "BEGIN", "DO", "FALSE", "INTEGER", "OR", "THEN", "WHILE"}; StateKind readyCheck(char ch); StateKind singleSymbolCheck(char ch); StateKind bothSymbolCheck(char ch); StateKind insideMulti1Check(char ch); StateKind insideMulti2Check(char ch); StateKind insideMulti3Check(char ch); StateKind bothEqualCheck(char ch); StateKind bothPeriod1Check(char ch); StateKind bothPeriod2Check(char ch); StateKind bothPeriod3Check(char ch); StateKind bothPeriod4Check(char ch); StateKind startIDCheck(char ch); StateKind startNumberCheck(char ch); StateKind startCommentCheck(char ch); StateKind startSingleStrCheck(char ch); StateKind startDoubleStrCheck(char ch); StateKind insideSingleStrCheck(char ch); StateKind insideDoubleStrCheck(char ch); StateKind endSingleQuoteCheck(char ch); StateKind endDoubleQuoteCheck(char ch); StateKind insideCommentCheck(char ch); StateKind endBracketCheck(char ch); StateKind (*DFA[22])(char ch)= { readyCheck, singleSymbolCheck, bothSymbolCheck, insideMulti1Check, insideMulti2Check, insideMulti3Check, bothEqualCheck, bothPeriod1Check, bothPeriod2Check, bothPeriod3Check, bothPeriod4Check, startIDCheck, startNumberCheck, startCommentCheck, startSingleStrCheck, startDoubleStrCheck, insideSingleStrCheck, insideDoubleStrCheck, endSingleQuoteCheck, endDoubleQuoteCheck, insideCommentCheck, endBracketCheck }; bool isLoop(StateKind state); bool isLetter(char ch); bool isDigit(char ch); bool isWhiteSpace(char ch); CharSet checkChar(char ch); bool isLoop(StateKind state) { return state==Ready||state==StartSingleStr||state==StartDoubleStr ||state==EndSingleQuote||state==EndDoubleQuote||state==StartComment|| state==EndBracket; } bool isWhiteSpace(char ch) { return ch==' '||ch=='\n'; } CharSet checkChar(char ch) { if (isLetter(ch)) { return Letters; } if (isDigit(ch)) { return Digits; } if (isWhiteSpace(ch)) { return WhiteSpace; } switch(ch) { case '+': return SingleBegins; case '-': return SingleBegins; case '*': return SingleBegins; case '/': return SingleBegins; case '^': return SingleBegins; case '!': return SingleBegins; case ';': return SingleBegins; case '=': return SingleBegins; case '#': return SingleBegins; case '(': return SingleBegins; case ')': return SingleBegins; case '[': return SingleBegins; case ':': return MultiBegins; case ']': return SingleBegins; case '<': return BothBegins; case '>': return BothBegins; case ',': return BothBegins; case '{': return CommentBegins; case '\'': return SingleQuote; case '"': return DoubleQuote; default: return Others; } } bool Scanner::openFile(const char* fileName) { stream.open(fileName, ios::in); if (readChar(lastChar)) { cout<<"ready to scan! boss!\n"; return true; } else { cout<<"the file is empty\n"; return false; } } bool Scanner::isReserved() { for (int i=0; i<ReservedCount; i++) { if (strcmp(buffer, reserved[i])==0) { return true; } } return false; } bool Scanner::isFinal(StateKind state, TokenKind& kind) { switch (state) { case EndSingle: kind=SingleKind; return true; case EndMulti: kind=MultiKind; return true; case EndID: kind=IDKind; return true; case EndNumber: kind=NumberKind; return true; case EndDoubleStr: kind=StringKind; return true; case EndSingleStr: kind=StringKind; return true; case EndComment: kind=CommentKind; return true; case Error: kind=ErrorKind; return true; default: return false; } } bool Scanner::readChar(char& ch) { if (stream.eof()) { return false; } else { stream>>ch; return true; } } TokenKind Scanner::next() { TokenKind kind=eof; StateKind state=Ready; counter=0; do { state=DFA[state](lastChar); if (isFinal(state, kind))//error is also consider to be final { buffer[counter]='\0'; if (kind==IDKind&&isReserved()) { kind =ReservedKind; } if (state==Error) { cout<<"encounter error\n"; } return kind; } //begin counting, and I don't want to output comment, //because comment may be very, very long.... if (!isLoop(state)&&state!=InsideComment) { //it begins buffer[counter]=lastChar; counter++; if (counter>MaxTokenLength) { cout<<"Max Token Length Reached!\n"; return ErrorKind; } } cout<<lastChar;//output anyway }while (readChar(lastChar)); if (kind!=eof) { return ErrorKind; } return kind;//default is eof } StateKind readyCheck(char ch) { //starting state must be ready switch(checkChar(ch)) { case WhiteSpace: return Ready; case SingleBegins: return SingleSymbol; case BothBegins: return BothSymbol; case MultiBegins: return InsideMulti1; case Letters: return StartID; case Digits: return StartNumber; case SingleQuote: return StartSingleStr; case DoubleQuote: return StartDoubleStr; case CommentBegins: return StartComment; default: return Error; } } StateKind startIDCheck(char ch) { if (isLetter(ch)||isDigit(ch)) { return StartID; } else { return EndID; } } StateKind endDoubleQuoteCheck(char ch) { return EndDoubleStr; } StateKind startCommentCheck(char ch) { if (ch=='}') { return EndComment; } else { return InsideComment; } } StateKind startNumberCheck(char ch) { if (isDigit(ch)) { return StartNumber; } if (isLetter(ch)) { return Error; } return EndNumber; } StateKind endSingleQuoteCheck(char ch) { return EndSingleStr; } StateKind bothEqualCheck(char ch) { if (isWhiteSpace(ch)||isLetter(ch)||isDigit(ch)) { return EndMulti; } else { return Error; } } StateKind bothPeriod1Check(char ch) { if (ch=='.') { return BothPeriod2; } else { return Error; } } StateKind bothPeriod2Check(char ch) { if (ch=='.') { return BothPeriod3; } else { return Error; } } StateKind bothPeriod3Check(char ch) { if (ch==',') { return BothPeriod4; } else { return Error; } } StateKind bothPeriod4Check(char ch) { if (isWhiteSpace(ch)||isLetter(ch)||isDigit(ch)) { return EndMulti; } else { return Error; } } StateKind bothSymbolCheck(char ch) { if (isWhiteSpace(ch)||isLetter(ch)||isDigit(ch)) { return EndSingle; } else { if ((buffer[0]=='<'||buffer[0]=='>')&&(ch=='=')) { return BothEqual; } if (buffer[0]==','&&ch=='.') { return BothPeriod1; } //default return Error; } } StateKind singleSymbolCheck(char ch) { return EndSingle; } StateKind startSingleStrCheck(char ch) { if (ch=='\'') { return EndSingleStr; } else { return InsideSingleStr; } } StateKind insideCommentCheck(char ch) { if (ch=='}') { return EndBracket; } else { return InsideComment; } } StateKind startDoubleStrCheck(char ch) { if (ch=='"') { return EndDoubleStr; } else { return InsideDoubleStr; } } StateKind insideMulti1Check(char ch) { if (ch=='=') { return InsideMulti2; } else { return Error; } } StateKind insideMulti2Check(char ch) { if (ch==':') { return InsideMulti3; } else { if (isWhiteSpace(ch)||isLetter(ch)||isDigit(ch)) { return EndMulti; } else { return Error; } } } StateKind insideMulti3Check(char ch) { if (isWhiteSpace(ch)||isLetter(ch)||isDigit(ch)) { return EndMulti; } else { return Error; } } StateKind endBracketCheck(char ch) { return EndComment; } StateKind insideSingleStrCheck(char ch) { if (ch=='\'') { return EndSingleQuote; } else { return InsideSingleStr; } } StateKind insideDoubleStrCheck(char ch) { if (ch=='"') { return EndDoubleQuote; } else { return InsideDoubleStr; } } bool isLetter(char ch) { return ch>='A'&&ch<='Z'; } bool isDigit(char ch) { return ch>='0'&&ch<='9'; }
file name: tiny.cpp (main)
#include <iostream> #include <fstream> #include "scanner.h" using namespace std; extern char* tokenName[8]; int main() { /* //this is a joke, as I want to change all character to capital char ch; ifstream in; ofstream out; in.open("c:\\sourcecode.txt", ios::in); out.open("c:\\newsourcecode.txt", ios::out); while (!in.eof()) { in.get(ch); if (ch>='a'&&ch<='z') { ch-='a'-'A'; } out<<ch; } in.close(); out.close(); */ Scanner S; TokenKind kind; S.openFile("c:\\sourcecode.txt"); kind=S.next(); while (kind!=eof&&kind!=ErrorKind) { cout<<"\nthe token type is:"<<tokenName[kind]<<" value:"<<S.output()<<endl; kind=S.next(); } return 0; }
Here is the result: The input file is "c:\sourcecode.txt". And I felt I am kind of insane that I try to
implement "nested comment" which is by all means a kind of garbage! Why should I try to do it? I really have
no idea. The input file is part of this program and in order to satisfy the strange requirement, I have to
replace all "::", '{' with some other symbols, like '<'. I highly suspect anybody would be interested in following
results.
ready to scan! boss! { { a test} } the token type is:comment value: CHARSET the token type is:identifier value:CHARSET CHECKCHAR the token type is:identifier value:CHECKCHAR ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (ISLETTER(CH)) { RETURN LETTERS; } IF (ISDIGIT(CH)) { RETURN DIGITS; } IF (ISWHITESPACE(CH)) { RETURN WHITESPACE; } SWITCH(CH) { CASE '+': RETURN SINGLEBEGINS; CASE '-': RETURN SINGLEBEGINS; CASE '*': RETURN SINGLEBEGINS; CASE '/': RETURN SINGLEBEGINS; CASE '^': RETURN SINGLEBEGINS; CASE '!': RETURN SINGLEBEGINS; CASE ';': RETURN SINGLEBEGINS; CASE '=': RETURN SINGLEBEGINS; CASE '#': RETURN SINGLEBEGINS; CASE '(': RETURN SINGLEBEGINS; CASE ')': RETURN SINGLEBEGINS; CASE '[': RETURN SINGLEBEGINS; CASE ':': RETURN MULTIBEGINS; CASE ']': RETURN SINGLEBEGINS; CASE '<': RETURN BOTHBEGINS; CASE '>': RETURN BOTHBEGINS; CASE ',': RETURN BOTHBEGINS; CASE ']': RETURN COMMENTBEGINS; CASE '\'': RETURN SINGLEQUOTE; CASE '"': RETURN DOUBLEQUOTE; DEFAULT: RETURN OTHERS; } } the token type is:comment value: BOOL the token type is:identifier value:BOOL SCANNER the token type is:identifier value:SCANNER > the token type is:single-symbol value:> OPENFILE the token type is:identifier value:OPENFILE ( the token type is:single-symbol value:( CONST the token type is:identifier value:CONST CHAR the token type is:identifier value:CHAR * the token type is:single-symbol value:* FILENAME the token type is:identifier value:FILENAME ) the token type is:single-symbol value:) { STREAM.OPEN(FILENAME, IOS::IN); IF (READCHAR(LASTCHAR)) { COUT<<"READY TO SCAN! BOSS!\N"; RETURN TRUE; } ELSE { COUT<<"THE FILE IS EMPTY\N"; RETURN FALSE; } } the token type is:comment value: BOOL the token type is:identifier value:BOOL SCANNER the token type is:identifier value:SCANNER < the token type is:single-symbol value:< ISRESERVED the token type is:identifier value:ISRESERVED ( the token type is:single-symbol value:( ) the token type is:single-symbol value:) { FOR (INT I=0; I<RESERVEDCOUNT; I++) { IF (STRCMP(BUFFER, RESERVED[I])==0) { RETURN TRUE; } } RETURN FALSE; } the token type is:comment value: BOOL the token type is:identifier value:BOOL SCANNER the token type is:identifier value:SCANNER > the token type is:single-symbol value:> ISFINAL the token type is:identifier value:ISFINAL ( the token type is:single-symbol value:( STATEKIND the token type is:identifier value:STATEKIND STATE the token type is:identifier value:STATE , the token type is:single-symbol value:, TOKENKIND the token type is:identifier value:TOKENKIND KIND the token type is:identifier value:KIND ) the token type is:single-symbol value:) { SWITCH (STATE) { CASE ENDSINGLE: KIND=SINGLEKIND; RETURN TRUE; CASE ENDMULTI: KIND=MULTIKIND; RETURN TRUE; CASE ENDID: KIND=IDKIND; RETURN TRUE; CASE ENDNUMBER: KIND=NUMBERKIND; RETURN TRUE; CASE ENDDOUBLESTR: KIND=STRINGKIND; RETURN TRUE; CASE ENDSINGLESTR: KIND=STRINGKIND; RETURN TRUE; CASE ENDCOMMENT: KIND=COMMENTKIND; RETURN TRUE; CASE ERROR: KIND=ERRORKIND; RETURN TRUE; DEFAULT: RETURN FALSE; } } the token type is:comment value: BOOL the token type is:identifier value:BOOL SCANNER the token type is:identifier value:SCANNER < the token type is:single-symbol value:< READCHAR the token type is:identifier value:READCHAR ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (STREAM.EOF()) { RETURN FALSE; } ELSE { STREAM>>CH; RETURN TRUE; } } the token type is:comment value: TOKENKIND the token type is:identifier value:TOKENKIND SCANNER the token type is:identifier value:SCANNER > the token type is:single-symbol value:> NEXT the token type is:identifier value:NEXT ( the token type is:single-symbol value:( ) the token type is:single-symbol value:) { TOKENKIND KIND=EOF; STATEKIND STATE=READY; COUNTER=0; DO { STATE=DFA[STATE](LASTCHAR); IF (ISFINAL(STATE, KIND))//ERROR IS ALSO CONSIDER TO BE FINAL { BUFFER[COUNTER]='\0'; IF (KIND==IDKIND&&ISRESERVED()) { KIND =RESERVEDKIND; } IF (STATE==ERROR) { COUT<<"ENCOUNTER ERROR\N"; } RETURN KIND; } //BEGIN COUNTING, AND I DON'T WANT TO OUTPUT COMMENT, //BECAUSE COMMENT MAY BE VERY, VERY LONG.... IF (!ISLOOP(STATE)&&STATE!=INSIDECOMMENT) { //IT BEGINS BUFFER[COUNTER]=LASTCHAR; COUNTER++; IF (COUNTER>MAXTOKENLENGTH) { COUT<<"MAX TOKEN LENGTH REACHED!\N"; RETURN ERRORKIND; } } COUT<<LASTCHAR;//OUTPUT ANYWAY }WHILE (READCHAR(LASTCHAR)); IF (KIND!=EOF) { RETURN ERRORKIND; } RETURN KIND;//DEFAULT IS EOF } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND READYCHECK the token type is:identifier value:READYCHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { //STARTING STATE MUST BE READY SWITCH(CHECKCHAR(CH)) { CASE WHITESPACE: RETURN READY; CASE SINGLEBEGINS: RETURN SINGLESYMBOL; CASE BOTHBEGINS: RETURN BOTHSYMBOL; CASE MULTIBEGINS: RETURN INSIDEMULTI1; CASE LETTERS: RETURN STARTID; CASE DIGITS: RETURN STARTNUMBER; CASE SINGLEQUOTE: RETURN STARTSINGLESTR; CASE DOUBLEQUOTE: RETURN STARTDOUBLESTR; CASE COMMENTBEGINS: RETURN STARTCOMMENT; DEFAULT: RETURN ERROR; } } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND STARTIDCHECK the token type is:identifier value:STARTIDCHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (ISLETTER(CH)||ISDIGIT(CH)) { RETURN STARTID; } ELSE { RETURN ENDID; } } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND ENDDOUBLEQUOTECHECK the token type is:identifier value:ENDDOUBLEQUOTECHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { RETURN ENDDOUBLESTR; } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND STARTCOMMENTCHECK the token type is:identifier value:STARTCOMMENTCHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (CH=='') { RETURN ENDCOMMENT; } ELSE { RETURN INSIDECOMMENT; } } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND STARTNUMBERCHECK the token type is:identifier value:STARTNUMBERCHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (ISDIGIT(CH)) { RETURN STARTNUMBER; } IF (ISLETTER(CH)) { RETURN ERROR; } RETURN ENDNUMBER; } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND ENDSINGLEQUOTECHECK the token type is:identifier value:ENDSINGLEQUOTECHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { RETURN ENDSINGLESTR; } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND BOTHEQUALCHECK the token type is:identifier value:BOTHEQUALCHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (ISWHITESPACE(CH)||ISLETTER(CH)||ISDIGIT(CH)) { RETURN ENDMULTI; } ELSE { RETURN ERROR; } } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND BOTHPERIOD1CHECK the token type is:identifier value:BOTHPERIOD1CHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (CH=='.') { RETURN BOTHPERIOD2; } ELSE { RETURN ERROR; } } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND BOTHPERIOD2CHECK the token type is:identifier value:BOTHPERIOD2CHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (CH=='.') { RETURN BOTHPERIOD3; } ELSE { RETURN ERROR; } } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND BOTHPERIOD3CHECK the token type is:identifier value:BOTHPERIOD3CHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (CH==',') { RETURN BOTHPERIOD4; } ELSE { RETURN ERROR; } } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND BOTHPERIOD4CHECK the token type is:identifier value:BOTHPERIOD4CHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (ISWHITESPACE(CH)||ISLETTER(CH)||ISDIGIT(CH)) { RETURN ENDMULTI; } ELSE { RETURN ERROR; } } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND BOTHSYMBOLCHECK the token type is:identifier value:BOTHSYMBOLCHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (ISWHITESPACE(CH)||ISLETTER(CH)||ISDIGIT(CH)) { RETURN ENDSINGLE; } ELSE { IF ((BUFFER[0]=='<'||BUFFER[0]=='>')&&(CH=='=')) { RETURN BOTHEQUAL; } IF (BUFFER[0]==','&&CH=='.') { RETURN BOTHPERIOD1; } //DEFAULT RETURN ERROR; } } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND SINGLESYMBOLCHECK the token type is:identifier value:SINGLESYMBOLCHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { RETURN ENDSINGLE; } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND STARTSINGLESTRCHECK the token type is:identifier value:STARTSINGLESTRCHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (CH=='\'') { RETURN ENDSINGLESTR; } ELSE { RETURN INSIDESINGLESTR; } } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND INSIDECOMMENTCHECK the token type is:identifier value:INSIDECOMMENTCHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (CH==']') { RETURN ENDBRACKET; } ELSE { RETURN INSIDECOMMENT; } } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND STARTDOUBLESTRCHECK the token type is:identifier value:STARTDOUBLESTRCHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (CH=='"') { RETURN ENDDOUBLESTR; } ELSE { RETURN INSIDEDOUBLESTR; } } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND INSIDEMULTI1CHECK the token type is:identifier value:INSIDEMULTI1CHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (CH=='=') { RETURN INSIDEMULTI2; } ELSE { RETURN ERROR; } } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND INSIDEMULTI2CHECK the token type is:identifier value:INSIDEMULTI2CHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (CH==':') { RETURN INSIDEMULTI3; } ELSE { IF (ISWHITESPACE(CH)||ISLETTER(CH)||ISDIGIT(CH)) { RETURN ENDMULTI; } ELSE { RETURN ERROR; } } } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND INSIDEMULTI3CHECK the token type is:identifier value:INSIDEMULTI3CHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (ISWHITESPACE(CH)||ISLETTER(CH)||ISDIGIT(CH)) { RETURN ENDMULTI; } ELSE { RETURN ERROR; } } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND ENDBRACKETCHECK the token type is:identifier value:ENDBRACKETCHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { RETURN ENDCOMMENT; } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND INSIDESINGLESTRCHECK the token type is:identifier value:INSIDESINGLESTRCHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (CH=='\'') { RETURN ENDSINGLEQUOTE; } ELSE { RETURN INSIDESINGLESTR; } } the token type is:comment value: STATEKIND the token type is:identifier value:STATEKIND INSIDEDOUBLESTRCHECK the token type is:identifier value:INSIDEDOUBLESTRCHECK ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { IF (CH=='"') { RETURN ENDDOUBLEQUOTE; } ELSE { RETURN INSIDEDOUBLESTR; } } the token type is:comment value: BOOL the token type is:identifier value:BOOL ISLETTER the token type is:identifier value:ISLETTER ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { RETURN CH>='A'&&CH<='Z'; } the token type is:comment value: BOOL the token type is:identifier value:BOOL ISDIGIT the token type is:identifier value:ISDIGIT ( the token type is:single-symbol value:( CHAR the token type is:identifier value:CHAR CH the token type is:identifier value:CH ) the token type is:single-symbol value:) { RETURN CH>='0'&&CH<='9'; } the token type is:comment value: