Real Scanner

A. First Edition
This is first edition of my real scanner which is just what the caption implies, a practical scanner. It is 
my headache. 
B.The problem

Design a lexical analyzer for the tokens of the programming language AGB.

The tokens of AGB are the following.

 Identifiers, numbers, strings:

identifier = letter(letter|digit)*

number = digit(digit)*

string = (~")|(~)*

with letter = A - Z, digit = 0 - 9 .

Strings are delimited by " or characters. Any ASCII character may appear

in a string, including { and }, but excluding " (in strings defined by " ~' ")

or (in strings de ned by    ).

 Single-character symbols:

+ - * / ^ ! , ; < > = # ( ) [ ]

 Multiple-character symbols:

:= :=: <= >= ,...,

 Reserved words:

AND BOOLEAN ELSE FOR MOD PROCEDURE TRUE

ARRAY DIVIDES END IF NOT REM VALUE

BEGIN DO FALSE INTEGER OR THEN WHILE

 Bookkeeping:

bof eof error

 

C.The idea of program
 

There is a pre-test sample scanner I wrote a couple of week ago. Almost all my idea is based on that.

I strictly follow the idea of DFA, simply translate all transition function into C++ functions. As

for OO, I don't really need it. You see, I am such a fanatic of function arrays. In order to use it

I have to define a series of global functions to put them in the array of function pointers. Then

my major function "next" will call them repeatedly and return value is index of next state or

function. There is another painful thing is the feof of "FILE" which I highly suspected as a bug.

I give up soon when I encounter it again and switch to fstream.

In the fstream, I try the function "get" and it cannot recognize "new line" character, so I have to

add "10" and "13" character to be one of "white space". It is totally insane!

I also add a strange feature that a nested comment, can you imagine what real usage of it? I doubt

it. It is simply a kind of stack.

Sometimes I just wonder if there will be somebody really feels interested and reads these stuff. So,

should I write all these for my own purpose? I guess so, and sometimes my wild imagination even

suggests that this can be a e-record which might be discovered by future archaeologists who are

themselves huge computer systems. These computers feel rather curious about how they are invented and

evolved. Except digging in piles of books, they might also search in all those hard disks in various

internet servers. By any chance, one might find all these words and realizes that even as early as

beginning of 21st century  a stupid guy has anticipated what happened hundreds of years later, or

maybe thousands of years later.

I always consider compiler is one of greatest achievement of history of computer evolution. It is

the communicating tools between computer and human. A more sophisticated communicating channel should

be established between human and machine. I am willing to devote my whole life to be such an agent or

interpreter or whatever you name it. This is really what I mean by saying my journey is oceans of

stars.

However, this pilgrim is such a grim road that I often have the heart-broken feeling of the hopeless

situations. Cause what if I am not the ONE? I understand exactly what Neo feels when Morphius told

him that he is the saviour of the world in Matrix. Except that I try to fool myself by convincing

myself I should be the one. I dare not imagine what if I found out the truth finally. Or will I

care about it? I mean life is simply a procedure. No matter what happened inside the procedure, there

will be an end similar to each other. Should I care about it?

The day before yesterday, I watched the film of Korean titled <My barbarian girlfriend>. It is a

kind of heart-touching film which uncovers a lot of time-washed scenes in last century. Some time

when my blood is still hot to burst out of veins. If I was given the choices, I would rather occupy

myself with more coding to forget about those chaos.  Yesterday I attended a party in home of

landlord. It is the similar feeling in my heart. The more exciting the surrounding is, the more

lonely I will feel. It is even the heart-chilling lonely which I would rather look for shelter in

machines' languages.
 

D.The major functions
E.Further improvement
F.File listing
1. scanner.h
2. scanner.cpp  
3. tiny.cpp (main)
 
 
file name: scanner.h
#ifndef SCANNER_H
#define SCANNER_H

#include <iostream>
#include <fstream>

using namespace std;

const int MaxTokenLength=1024;
const int StateNumber=22;
const int SingleCount=16;
const int MultiCount=5;
const int LetterCount=26;
const int DigitCount=10;
const int ReservedCount=21;

extern char buffer[MaxTokenLength+1];//in order to display as string

enum TokenKind
{IDKind, NumberKind, StringKind, SingleKind, MultiKind, ReservedKind, CommentKind, 
ErrorKind, eof, bof};



enum StateKind
{Ready, SingleSymbol, BothSymbol, InsideMulti1, InsideMulti2, InsideMulti3, 
BothEqual, BothPeriod1, BothPeriod2, BothPeriod3, BothPeriod4,
StartID, StartNumber, StartComment, StartSingleStr, StartDoubleStr, 
InsideSingleStr, InsideDoubleStr, EndSingleQuote, EndDoubleQuote, InsideComment, 
EndBracket, EndSingle, EndMulti, EndID,  EndNumber, EndComment,
EndDoubleStr,  EndSingleStr, Error};

enum CharSet
{ SingleBegins, BothBegins, MultiBegins, WhiteSpace, Letters, Digits, 
CommentBegins, SingleQuote, DoubleQuote, Others};

class Scanner
{
private:	
	char lastChar;
	ifstream stream;
	bool readChar(char& ch);
	bool isFinal(StateKind state, TokenKind& kind);
	bool isReserved();
public:
	bool openFile(const char* fileName);
	TokenKind next();
	char* output() { return buffer;}
};

#endif

file name: scanner.cpp 
#include <iostream>
#include "scanner.h"

using namespace std;


int counter=0;
char buffer[MaxTokenLength+1];

char* tokenName[8]={"identifier", "number", "string", "single-symbol", "multi-symbol",
	"reserved", "comment", "error"};

char single[SingleCount]={'+', '-', '*', '/', '^', '!', ',', ';', '<', '>', '=', '#', '(', 
		')', '[', ']'};
char multiBeginning[MultiCount-1]={':', '<', '>', ','};

char* reserved[ReservedCount]={"AND", "BOOLEAN", "ELSE", "FOR", "MOD", "PROCEDURE", "TRUE",
"ARRAY", "DIVIDES", "END", "IF", "NOT", "REM", "VALUE", "BEGIN", "DO", "FALSE", "INTEGER",
"OR", "THEN", "WHILE"};




StateKind readyCheck(char ch);
StateKind singleSymbolCheck(char ch);
StateKind bothSymbolCheck(char ch);
StateKind insideMulti1Check(char ch);
StateKind insideMulti2Check(char ch);
StateKind insideMulti3Check(char ch);
StateKind bothEqualCheck(char ch);
StateKind bothPeriod1Check(char ch);
StateKind bothPeriod2Check(char ch);
StateKind bothPeriod3Check(char ch);
StateKind bothPeriod4Check(char ch);
StateKind startIDCheck(char ch);
StateKind startNumberCheck(char ch);
StateKind startCommentCheck(char ch);
StateKind startSingleStrCheck(char ch);
StateKind startDoubleStrCheck(char ch);
StateKind insideSingleStrCheck(char ch);
StateKind insideDoubleStrCheck(char ch);
StateKind endSingleQuoteCheck(char ch);
StateKind endDoubleQuoteCheck(char ch);
StateKind insideCommentCheck(char ch);
StateKind endBracketCheck(char ch);



StateKind (*DFA[22])(char ch)=
{
		readyCheck,
		singleSymbolCheck,
		bothSymbolCheck,
		insideMulti1Check,
		insideMulti2Check,
		insideMulti3Check,
		bothEqualCheck,
		bothPeriod1Check,
		bothPeriod2Check,
		bothPeriod3Check,
		bothPeriod4Check,
		startIDCheck,
		startNumberCheck,
		startCommentCheck,
		startSingleStrCheck,
		startDoubleStrCheck,
		insideSingleStrCheck,
		insideDoubleStrCheck,
		endSingleQuoteCheck,
		endDoubleQuoteCheck,
		insideCommentCheck,
		endBracketCheck
};


bool isLoop(StateKind state);
bool isLetter(char ch);
bool isDigit(char ch);
bool isWhiteSpace(char ch);
CharSet checkChar(char ch);


bool isLoop(StateKind state)
{
	return state==Ready||state==StartSingleStr||state==StartDoubleStr
		||state==EndSingleQuote||state==EndDoubleQuote||state==StartComment||
		state==EndBracket;
}



bool isWhiteSpace(char ch)
{
	return ch==' '||ch=='\n';
}

CharSet checkChar(char ch)
{
	if (isLetter(ch))
	{
		return Letters;
	}
	if (isDigit(ch))
	{
		return Digits;
	}
	if (isWhiteSpace(ch))
	{
		return WhiteSpace;
	}
	switch(ch)
	{
	case '+':
		return SingleBegins;
	case '-':
		return SingleBegins;
	case '*':
		return SingleBegins;
	case '/':
		return SingleBegins;
	case '^':
		return SingleBegins;
	case '!':
		return SingleBegins;
	case ';':
		return SingleBegins;
	case '=':
		return SingleBegins;
	case '#':
		return SingleBegins;
	case '(':
		return SingleBegins;
	case ')':
		return SingleBegins;
	case '[':
		return SingleBegins;
	case ':':
		return MultiBegins;
	case ']':
		return SingleBegins;
	case '<':
		return BothBegins;
	case '>':
		return BothBegins;
	case ',':
		return BothBegins;
	case '{':
		return CommentBegins;
	case '\'':
		return SingleQuote;
	case '"':
		return DoubleQuote;
	default:
		return Others;
	}
}

bool Scanner::openFile(const char* fileName)
{	
	stream.open(fileName, ios::in);

	if (readChar(lastChar))
	{
		cout<<"ready to scan! boss!\n";
		return true;
	}
	else
	{
		cout<<"the file is empty\n";
		return false;
	}
}

bool Scanner::isReserved()
{
	for (int i=0; i<ReservedCount; i++)
	{
		if (strcmp(buffer, reserved[i])==0)
		{
			return true;
		}
	}
	return false;
}

bool Scanner::isFinal(StateKind state, TokenKind& kind)
{
	switch (state)
	{
	case EndSingle:
		kind=SingleKind;
		return true;
	case EndMulti:
		kind=MultiKind;
		return true;
	case EndID:
		kind=IDKind;
		return true;
	case EndNumber:
		kind=NumberKind;
		return true;
	case EndDoubleStr:
		kind=StringKind;
		return true;
	case EndSingleStr:
		kind=StringKind;
		return true;
	case EndComment:
		kind=CommentKind;
		return true;
	case Error:
		kind=ErrorKind;
		return true;
	default:
		return false;
	}
}


bool Scanner::readChar(char& ch)
{
	if (stream.eof())
	{
		return false;
	}
	else
	{
		stream>>ch;		
		return true;
	}
}

TokenKind Scanner::next()
{
	TokenKind kind=eof;
	
	StateKind state=Ready;
	counter=0;
	do
	{
		state=DFA[state](lastChar);
		if (isFinal(state, kind))//error is also consider to be final
		{		
			buffer[counter]='\0';
			if (kind==IDKind&&isReserved())
			{
				kind =ReservedKind;
			}
			if (state==Error)
			{
				cout<<"encounter error\n";
			}
			return kind;			
		}
		
		//begin counting, and I don't want to output comment,
		//because comment may be very, very long....
		if (!isLoop(state)&&state!=InsideComment)
		{
			//it begins
			buffer[counter]=lastChar;
			counter++;
			if (counter>MaxTokenLength)
			{
				cout<<"Max Token Length Reached!\n";
				return ErrorKind;
			}
		}

		cout<<lastChar;//output anyway
	
	
	}while (readChar(lastChar));

	if (kind!=eof)
	{
		return ErrorKind;
	}
	return kind;//default is eof
}


StateKind readyCheck(char ch)
{
	//starting state must be ready
	switch(checkChar(ch))
	{
	case WhiteSpace:
		return Ready;
	case SingleBegins:
		return SingleSymbol;
	case BothBegins:
		return BothSymbol;
	case MultiBegins:
		return InsideMulti1;
	case Letters:
		return StartID;
	case Digits:
		return StartNumber;
	case SingleQuote:
		return StartSingleStr;
	case DoubleQuote:
		return StartDoubleStr;
	case CommentBegins:
		return StartComment;
	default:
		return Error;
	}

}

StateKind startIDCheck(char ch)
{
	if (isLetter(ch)||isDigit(ch))
	{
		return StartID;
	}
	else
	{
		return	EndID;
	}
}

StateKind endDoubleQuoteCheck(char ch)
{
	return EndDoubleStr;
}

StateKind startCommentCheck(char ch)
{
	if (ch=='}')
	{
		return EndComment;
	}
	else
	{
		return InsideComment;
	}
}


StateKind startNumberCheck(char ch)
{
	if (isDigit(ch))
	{
		return StartNumber;
	}
	if (isLetter(ch))
	{
		return Error;
	}
	return EndNumber;
}


StateKind endSingleQuoteCheck(char ch)
{
	return EndSingleStr;
}


StateKind bothEqualCheck(char ch)
{
	if (isWhiteSpace(ch)||isLetter(ch)||isDigit(ch))
	{
		return EndMulti;
	}
	else
	{
		return Error;
	}
}



StateKind bothPeriod1Check(char ch)
{
	if (ch=='.')
	{
		return BothPeriod2;
	}
	else
	{
		return Error;
	}
}

StateKind bothPeriod2Check(char ch)
{
	if (ch=='.')
	{
		return BothPeriod3;
	}
	else
	{
		return Error;
	}
}

StateKind bothPeriod3Check(char ch)
{
	if (ch==',')
	{
		return BothPeriod4;
	}
	else
	{
		return Error;
	}
}



StateKind bothPeriod4Check(char ch)
{
	if (isWhiteSpace(ch)||isLetter(ch)||isDigit(ch))
	{
		return EndMulti;
	}
	else
	{
		return Error;
	}
}


StateKind bothSymbolCheck(char ch)
{
	if (isWhiteSpace(ch)||isLetter(ch)||isDigit(ch))
	{
		return EndSingle;
	}
	else
	{
		if ((buffer[0]=='<'||buffer[0]=='>')&&(ch=='='))
		{
			return BothEqual;		
		}
		if (buffer[0]==','&&ch=='.')
		{
			return BothPeriod1;
		}
		//default
		return Error;
	}
}

StateKind singleSymbolCheck(char ch)
{
	return EndSingle;
}



StateKind startSingleStrCheck(char ch)
{
	if (ch=='\'')
	{
		return EndSingleStr;
	}
	else
	{
		return InsideSingleStr;
	}
}

StateKind insideCommentCheck(char ch)
{
	if (ch=='}')
	{
		return EndBracket;
	}
	else
	{
		return InsideComment;
	}
}


StateKind startDoubleStrCheck(char ch)
{
	if (ch=='"')
	{
		return EndDoubleStr;
	}
	else
	{
		return InsideDoubleStr;
	}
}

StateKind insideMulti1Check(char ch)
{
	if (ch=='=')
	{
		return InsideMulti2;
	}
	else
	{
		return Error;
	}
}

StateKind insideMulti2Check(char ch)
{
	if (ch==':')
	{
		return InsideMulti3;
	}
	else
	{
		if (isWhiteSpace(ch)||isLetter(ch)||isDigit(ch))
		{
			return EndMulti;
		}
		else
		{
			return Error;
		}
	}
}

StateKind insideMulti3Check(char ch)
{
	if (isWhiteSpace(ch)||isLetter(ch)||isDigit(ch))
	{
		return EndMulti;
	}
	else
	{
		return Error;
	}
}

StateKind endBracketCheck(char ch)
{
	return EndComment;
}


StateKind insideSingleStrCheck(char ch)
{
	if (ch=='\'')
	{
		return EndSingleQuote;
	}
	else
	{
		return InsideSingleStr;
	}
}

StateKind insideDoubleStrCheck(char ch)
{
	if (ch=='"')
	{
		return EndDoubleQuote;
	}
	else
	{
		return InsideDoubleStr;
	}
}



bool isLetter(char ch)
{
	return ch>='A'&&ch<='Z';
}

bool isDigit(char ch)
{
	return ch>='0'&&ch<='9';
}

file name: tiny.cpp (main)
#include <iostream>
#include <fstream>
#include "scanner.h"

using namespace std;

extern char* tokenName[8];

int main()
{
	/*
	//this is a joke, as I want to change all character to capital
	char ch;
	ifstream in;
	ofstream out;
	in.open("c:\\sourcecode.txt", ios::in);
	out.open("c:\\newsourcecode.txt", ios::out);
	while (!in.eof())
	{
		in.get(ch);
		if (ch>='a'&&ch<='z')
		{
			ch-='a'-'A';
		}
		out<<ch;
	}
	in.close();
	out.close();
*/

	Scanner S;
	TokenKind kind;
	S.openFile("c:\\sourcecode.txt");
	kind=S.next();
	while (kind!=eof&&kind!=ErrorKind)
	{
		cout<<"\nthe token type is:"<<tokenName[kind]<<" value:"<<S.output()<<endl;
		kind=S.next();
	}

	return 0;
}


Here is the result: The input file is "c:\sourcecode.txt".  And I felt I am kind of insane that I try to 
implement "nested comment" which is by all means a kind of garbage! Why should I try to do it? I really have
no idea. The input file is part of this program and in order to satisfy the strange requirement, I have to 
replace all "::", '{' with some other symbols, like '<'. I highly suspect anybody would be interested in following
results.
ready to scan! boss!
{ 
{ 
a test} 
}
the token type is:comment value:

CHARSET
the token type is:identifier value:CHARSET
CHECKCHAR
the token type is:identifier value:CHECKCHAR
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (ISLETTER(CH)) 
{ 
RETURN LETTERS; 
} 
IF (ISDIGIT(CH)) 
{ 
RETURN DIGITS; 
} 
IF (ISWHITESPACE(CH)) 
{ 
RETURN WHITESPACE; 
} 
SWITCH(CH) 
{ 
CASE '+': 
RETURN SINGLEBEGINS; 
CASE '-': 
RETURN SINGLEBEGINS; 
CASE '*': 
RETURN SINGLEBEGINS; 
CASE '/': 
RETURN SINGLEBEGINS; 
CASE '^': 
RETURN SINGLEBEGINS; 
CASE '!': 
RETURN SINGLEBEGINS; 
CASE ';': 
RETURN SINGLEBEGINS; 
CASE '=': 
RETURN SINGLEBEGINS; 
CASE '#': 
RETURN SINGLEBEGINS; 
CASE '(': 
RETURN SINGLEBEGINS; 
CASE ')': 
RETURN SINGLEBEGINS; 
CASE '[': 
RETURN SINGLEBEGINS; 
CASE ':': 
RETURN MULTIBEGINS; 
CASE ']': 
RETURN SINGLEBEGINS; 
CASE '<': 
RETURN BOTHBEGINS; 
CASE '>': 
RETURN BOTHBEGINS; 
CASE ',': 
RETURN BOTHBEGINS; 
CASE ']': 
RETURN COMMENTBEGINS; 
CASE '\'': 
RETURN SINGLEQUOTE; 
CASE '"': 
RETURN DOUBLEQUOTE; 
DEFAULT: 
RETURN OTHERS; 
} 
}
the token type is:comment value:


BOOL
the token type is:identifier value:BOOL
SCANNER
the token type is:identifier value:SCANNER
>
the token type is:single-symbol value:>
OPENFILE
the token type is:identifier value:OPENFILE
(
the token type is:single-symbol value:(
CONST
the token type is:identifier value:CONST
CHAR
the token type is:identifier value:CHAR
*
the token type is:single-symbol value:*
FILENAME
the token type is:identifier value:FILENAME
)
the token type is:single-symbol value:)

{ 
STREAM.OPEN(FILENAME, IOS::IN); 

IF (READCHAR(LASTCHAR)) 
{ 
COUT<<"READY TO SCAN! BOSS!\N"; 
RETURN TRUE; 
} 
ELSE 
{ 
COUT<<"THE FILE IS EMPTY\N"; 
RETURN FALSE; 
} 
}
the token type is:comment value:


BOOL
the token type is:identifier value:BOOL
SCANNER
the token type is:identifier value:SCANNER
<
the token type is:single-symbol value:<
ISRESERVED
the token type is:identifier value:ISRESERVED
(
the token type is:single-symbol value:(
)
the token type is:single-symbol value:)

{ 
FOR (INT I=0; I<RESERVEDCOUNT; I++) 
{ 
IF (STRCMP(BUFFER, RESERVED[I])==0) 
{ 
RETURN TRUE; 
} 
} 
RETURN FALSE; 
}
the token type is:comment value:


BOOL
the token type is:identifier value:BOOL
SCANNER
the token type is:identifier value:SCANNER
>
the token type is:single-symbol value:>
ISFINAL
the token type is:identifier value:ISFINAL
(
the token type is:single-symbol value:(
STATEKIND
the token type is:identifier value:STATEKIND
STATE
the token type is:identifier value:STATE
,
the token type is:single-symbol value:,
TOKENKIND
the token type is:identifier value:TOKENKIND
KIND
the token type is:identifier value:KIND
)
the token type is:single-symbol value:)

{ 
SWITCH (STATE) 
{ 
CASE ENDSINGLE: 
KIND=SINGLEKIND; 
RETURN TRUE; 
CASE ENDMULTI: 
KIND=MULTIKIND; 
RETURN TRUE; 
CASE ENDID: 
KIND=IDKIND; 
RETURN TRUE; 
CASE ENDNUMBER: 
KIND=NUMBERKIND; 
RETURN TRUE; 
CASE ENDDOUBLESTR: 
KIND=STRINGKIND; 
RETURN TRUE; 
CASE ENDSINGLESTR: 
KIND=STRINGKIND; 
RETURN TRUE; 
CASE ENDCOMMENT: 
KIND=COMMENTKIND; 
RETURN TRUE; 
CASE ERROR: 
KIND=ERRORKIND; 
RETURN TRUE; 
DEFAULT: 
RETURN FALSE; 
} 
}
the token type is:comment value:



BOOL
the token type is:identifier value:BOOL
SCANNER
the token type is:identifier value:SCANNER
<
the token type is:single-symbol value:<
READCHAR
the token type is:identifier value:READCHAR
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (STREAM.EOF()) 
{ 
RETURN FALSE; 
} 
ELSE 
{ 
STREAM>>CH; 
RETURN TRUE; 
} 
}
the token type is:comment value:


TOKENKIND
the token type is:identifier value:TOKENKIND
SCANNER
the token type is:identifier value:SCANNER
>
the token type is:single-symbol value:>
NEXT
the token type is:identifier value:NEXT
(
the token type is:single-symbol value:(
)
the token type is:single-symbol value:)

{ 
TOKENKIND KIND=EOF; 

STATEKIND STATE=READY; 
COUNTER=0; 
DO 
{ 
STATE=DFA[STATE](LASTCHAR); 
IF (ISFINAL(STATE, KIND))//ERROR IS ALSO CONSIDER TO BE FINAL 
{ 
BUFFER[COUNTER]='\0'; 
IF (KIND==IDKIND&&ISRESERVED()) 
{ 
KIND =RESERVEDKIND; 
} 
IF (STATE==ERROR) 
{ 
COUT<<"ENCOUNTER ERROR\N"; 
} 
RETURN KIND; 
} 

//BEGIN COUNTING, AND I DON'T WANT TO OUTPUT COMMENT, 
//BECAUSE COMMENT MAY BE VERY, VERY LONG.... 
IF (!ISLOOP(STATE)&&STATE!=INSIDECOMMENT) 
{ 
//IT BEGINS 
BUFFER[COUNTER]=LASTCHAR; 
COUNTER++; 
IF (COUNTER>MAXTOKENLENGTH) 
{ 
COUT<<"MAX TOKEN LENGTH REACHED!\N"; 
RETURN ERRORKIND; 
} 
} 

COUT<<LASTCHAR;//OUTPUT ANYWAY 


}WHILE (READCHAR(LASTCHAR)); 

IF (KIND!=EOF) 
{ 
RETURN ERRORKIND; 
} 
RETURN KIND;//DEFAULT IS EOF 
}
the token type is:comment value:



STATEKIND
the token type is:identifier value:STATEKIND
READYCHECK
the token type is:identifier value:READYCHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
//STARTING STATE MUST BE READY 
SWITCH(CHECKCHAR(CH)) 
{ 
CASE WHITESPACE: 
RETURN READY; 
CASE SINGLEBEGINS: 
RETURN SINGLESYMBOL; 
CASE BOTHBEGINS: 
RETURN BOTHSYMBOL; 
CASE MULTIBEGINS: 
RETURN INSIDEMULTI1; 
CASE LETTERS: 
RETURN STARTID; 
CASE DIGITS: 
RETURN STARTNUMBER; 
CASE SINGLEQUOTE: 
RETURN STARTSINGLESTR; 
CASE DOUBLEQUOTE: 
RETURN STARTDOUBLESTR; 
CASE COMMENTBEGINS: 
RETURN STARTCOMMENT; 
DEFAULT: 
RETURN ERROR; 
} 

}
the token type is:comment value:


STATEKIND
the token type is:identifier value:STATEKIND
STARTIDCHECK
the token type is:identifier value:STARTIDCHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (ISLETTER(CH)||ISDIGIT(CH)) 
{ 
RETURN STARTID; 
} 
ELSE 
{ 
RETURN ENDID; 
} 
}
the token type is:comment value:


STATEKIND
the token type is:identifier value:STATEKIND
ENDDOUBLEQUOTECHECK
the token type is:identifier value:ENDDOUBLEQUOTECHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
RETURN ENDDOUBLESTR; 
}
the token type is:comment value:


STATEKIND
the token type is:identifier value:STATEKIND
STARTCOMMENTCHECK
the token type is:identifier value:STARTCOMMENTCHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (CH=='') 
{ 
RETURN ENDCOMMENT; 
} 
ELSE 
{ 
RETURN INSIDECOMMENT; 
} 
}
the token type is:comment value:



STATEKIND
the token type is:identifier value:STATEKIND
STARTNUMBERCHECK
the token type is:identifier value:STARTNUMBERCHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (ISDIGIT(CH)) 
{ 
RETURN STARTNUMBER; 
} 
IF (ISLETTER(CH)) 
{ 
RETURN ERROR; 
} 
RETURN ENDNUMBER; 
}
the token type is:comment value:



STATEKIND
the token type is:identifier value:STATEKIND
ENDSINGLEQUOTECHECK
the token type is:identifier value:ENDSINGLEQUOTECHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
RETURN ENDSINGLESTR; 
}
the token type is:comment value:



STATEKIND
the token type is:identifier value:STATEKIND
BOTHEQUALCHECK
the token type is:identifier value:BOTHEQUALCHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (ISWHITESPACE(CH)||ISLETTER(CH)||ISDIGIT(CH)) 
{ 
RETURN ENDMULTI; 
} 
ELSE 
{ 
RETURN ERROR; 
} 
}
the token type is:comment value:




STATEKIND
the token type is:identifier value:STATEKIND
BOTHPERIOD1CHECK
the token type is:identifier value:BOTHPERIOD1CHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (CH=='.') 
{ 
RETURN BOTHPERIOD2; 
} 
ELSE 
{ 
RETURN ERROR; 
} 
}
the token type is:comment value:


STATEKIND
the token type is:identifier value:STATEKIND
BOTHPERIOD2CHECK
the token type is:identifier value:BOTHPERIOD2CHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (CH=='.') 
{ 
RETURN BOTHPERIOD3; 
} 
ELSE 
{ 
RETURN ERROR; 
} 
}
the token type is:comment value:


STATEKIND
the token type is:identifier value:STATEKIND
BOTHPERIOD3CHECK
the token type is:identifier value:BOTHPERIOD3CHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (CH==',') 
{ 
RETURN BOTHPERIOD4; 
} 
ELSE 
{ 
RETURN ERROR; 
} 
}
the token type is:comment value:




STATEKIND
the token type is:identifier value:STATEKIND
BOTHPERIOD4CHECK
the token type is:identifier value:BOTHPERIOD4CHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (ISWHITESPACE(CH)||ISLETTER(CH)||ISDIGIT(CH)) 
{ 
RETURN ENDMULTI; 
} 
ELSE 
{ 
RETURN ERROR; 
} 
}
the token type is:comment value:



STATEKIND
the token type is:identifier value:STATEKIND
BOTHSYMBOLCHECK
the token type is:identifier value:BOTHSYMBOLCHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (ISWHITESPACE(CH)||ISLETTER(CH)||ISDIGIT(CH)) 
{ 
RETURN ENDSINGLE; 
} 
ELSE 
{ 
IF ((BUFFER[0]=='<'||BUFFER[0]=='>')&&(CH=='=')) 
{ 
RETURN BOTHEQUAL; 
} 
IF (BUFFER[0]==','&&CH=='.') 
{ 
RETURN BOTHPERIOD1; 
} 
//DEFAULT 
RETURN ERROR; 
} 
}
the token type is:comment value:


STATEKIND
the token type is:identifier value:STATEKIND
SINGLESYMBOLCHECK
the token type is:identifier value:SINGLESYMBOLCHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
RETURN ENDSINGLE; 
}
the token type is:comment value:




STATEKIND
the token type is:identifier value:STATEKIND
STARTSINGLESTRCHECK
the token type is:identifier value:STARTSINGLESTRCHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (CH=='\'') 
{ 
RETURN ENDSINGLESTR; 
} 
ELSE 
{ 
RETURN INSIDESINGLESTR; 
} 
}
the token type is:comment value:


STATEKIND
the token type is:identifier value:STATEKIND
INSIDECOMMENTCHECK
the token type is:identifier value:INSIDECOMMENTCHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (CH==']') 
{ 
RETURN ENDBRACKET; 
} 
ELSE 
{ 
RETURN INSIDECOMMENT; 
} 
}
the token type is:comment value:



STATEKIND
the token type is:identifier value:STATEKIND
STARTDOUBLESTRCHECK
the token type is:identifier value:STARTDOUBLESTRCHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (CH=='"') 
{ 
RETURN ENDDOUBLESTR; 
} 
ELSE 
{ 
RETURN INSIDEDOUBLESTR; 
} 
}
the token type is:comment value:


STATEKIND
the token type is:identifier value:STATEKIND
INSIDEMULTI1CHECK
the token type is:identifier value:INSIDEMULTI1CHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (CH=='=') 
{ 
RETURN INSIDEMULTI2; 
} 
ELSE 
{ 
RETURN ERROR; 
} 
}
the token type is:comment value:


STATEKIND
the token type is:identifier value:STATEKIND
INSIDEMULTI2CHECK
the token type is:identifier value:INSIDEMULTI2CHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (CH==':') 
{ 
RETURN INSIDEMULTI3; 
} 
ELSE 
{ 
IF (ISWHITESPACE(CH)||ISLETTER(CH)||ISDIGIT(CH)) 
{ 
RETURN ENDMULTI; 
} 
ELSE 
{ 
RETURN ERROR; 
} 
} 
}
the token type is:comment value:


STATEKIND
the token type is:identifier value:STATEKIND
INSIDEMULTI3CHECK
the token type is:identifier value:INSIDEMULTI3CHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (ISWHITESPACE(CH)||ISLETTER(CH)||ISDIGIT(CH)) 
{ 
RETURN ENDMULTI; 
} 
ELSE 
{ 
RETURN ERROR; 
} 
}
the token type is:comment value:


STATEKIND
the token type is:identifier value:STATEKIND
ENDBRACKETCHECK
the token type is:identifier value:ENDBRACKETCHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
RETURN ENDCOMMENT; 
}
the token type is:comment value:



STATEKIND
the token type is:identifier value:STATEKIND
INSIDESINGLESTRCHECK
the token type is:identifier value:INSIDESINGLESTRCHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (CH=='\'') 
{ 
RETURN ENDSINGLEQUOTE; 
} 
ELSE 
{ 
RETURN INSIDESINGLESTR; 
} 
}
the token type is:comment value:


STATEKIND
the token type is:identifier value:STATEKIND
INSIDEDOUBLESTRCHECK
the token type is:identifier value:INSIDEDOUBLESTRCHECK
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
IF (CH=='"') 
{ 
RETURN ENDDOUBLEQUOTE; 
} 
ELSE 
{ 
RETURN INSIDEDOUBLESTR; 
} 
}
the token type is:comment value:




BOOL
the token type is:identifier value:BOOL
ISLETTER
the token type is:identifier value:ISLETTER
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
RETURN CH>='A'&&CH<='Z'; 
}
the token type is:comment value:


BOOL
the token type is:identifier value:BOOL
ISDIGIT
the token type is:identifier value:ISDIGIT
(
the token type is:single-symbol value:(
CHAR
the token type is:identifier value:CHAR
CH
the token type is:identifier value:CH
)
the token type is:single-symbol value:)

{ 
RETURN CH>='0'&&CH<='9'; 
}
the token type is:comment value:














                                 back.gif (341 bytes)       up.gif (335 bytes)         next.gif (337 bytes)