Reduce

A.First Edition
This is my second edition of Reduce, which has only a single usage to translate a NFA into regular expression.
B.The problem
I feel unsatisfied with the code. It is ugly! However, I don't have enough energy to make a perfect design.
In order to translate the "Transition Diagram" into "Regular Expression", you need to eliminate all nodes except
the start and final ones. 1. Find out all inward and outward arrows and then replace the node by combining each
inward starting node with each outward ending node. Their expressions should be concatenated.
C.The idea of program
This edition corrects several big bugs in first one. 
1. The first edition assumes that at final stage which only includes "start" and "final" and there is only arrow
from start to final". But this is not true. There can be either only one "start" and "final" state or there can
be arrows from "final" to "start". If there is arrow from final to start, the final expression will be quite
complicated. Suppose we name loop string at start, string from start to final, string from final to start and
string looping at final state as r1, r2, r3, r4 then the correct expression is:
star(r1).r2.star(r4+r3.star(r1).r2). 
2. The special case is that the start and final state are actually one state. Then we only need the looping string.
3. Another minor bug is that the first version assumes that after all eliminating operations, the final state won't
have parallel arrows. This is not always true. Suppose the initial states is only including the parallel arrows, 
and since if there is no node to be eliminated, then the parallel arrows won't be combined. So, we have to check.
4. The final small special case is to remove the trap node which only has inBound arrows. 
5. The string operations of star, dot, plus are boring, and I cannot return local variable. So, I defined a private
buffer to be returned as pointer.
D.The major functions
C.Further improvement
I don't bother to check the result since it is too complicated. So, I have no idea if it is correct or not.
#include <iostream>
#include <fstream>

using namespace std;

enum States
{InBound, OutBound, IsLoop};

enum FinalArrow
{StartLoop, StartFinal, FinalStart, FinalLoop};

struct Function
{
	int domain;
	char* expr;
	int range;
	bool isStart;//means the domain is start
	bool isFinal;//means the range is final	
};

class Reduce
{
private:
	char privateBuffer[512];
	Function rules[30];
	Function temp[30];
	int tempCount;
	int size;
	int inCount;
	int outCount;
	int loopCount;
	//this record the three type arrow(inbound,outbound, loop) for a particular node
	int buffer[3][30];					  	
	bool findStates(int state);
	bool findShortCut(int state);
	void replace(int state);
	void copyFunction(Function* target, Function* source, bool toClear=false);
	void generate();
	void moveRule(int state);
//	bool notAffected(int index);
	void clearState();
	void addFunction(int index);
	void getExpr();
	void copyStr(char* target, const char* source);
	char* getFinalExpr(char str[][256]);
	char* star(const char* source);
	char* dot(char* target, const char* source);
	char* plus(char* target, const char* source);
public:
	void setParam(Function**theRule, int theSize);
	void readRules(const char* fileName);
	void beginReduce();

};

int main()
{
	Reduce R;
	//you need to copy my input file to your local C:
	R.readRules("c:\\rules.txt");
	R.beginReduce();
	
	return 0;
}


char* Reduce::star(const char* source)
{
	if (source==NULL)
	{
		return NULL;
	}
	strcpy(privateBuffer, "star(");
	strcat(strcat(privateBuffer, source), ")");
	return privateBuffer;
}

char* Reduce::dot(char* target, const char* source)
{
	if (source!=NULL)
	{
		if (target==NULL)
		{
			strcpy(target, source);
		}
		else
		{
			strcat(strcat(target, "."), source);
		}
	}
	
	return target;
}

char* Reduce::plus(char* target, const char* source)
{
	if (source==NULL)
	{
		return target;
	}
	strcat(target, "+");
	return strcat(target, source);
}


char* Reduce::getFinalExpr(char str[][256])
{
	//the formula is: star(r1).r2.star(r4+r3.star(r1).r2)
	char exp[512];
	char buf[512];
	strcpy(exp, str[FinalStart]);//r3
	dot(exp, star(str[StartLoop]));
	dot(exp, str[StartFinal]);//r3.star(r1).r2
	dot(exp, str[FinalLoop]); //exp+r4, reversed
	strcpy(buf, star(exp));
	strcpy(exp, str[StartLoop]);//exp=r1
	strcpy(exp, star(exp));//star(r1);
	dot(exp, str[StartFinal]);//exp.r2
	dot(exp, buf);
	strcpy(privateBuffer, exp);
	return privateBuffer;
}


void Reduce::copyStr(char* target, const char* source)
{
//	delete []target;
	target = new char[strlen(source)+1];
	strcpy(target, source);
}

//add a function with index of "index" in rules to "temp"
void Reduce::addFunction(int index)
{
	char str[256];
	//scan to see if there are parallel function already in temp
	for (int i=0; i<tempCount; i++)
	{
		//try to union two expr, if they are parallel
		if (temp[i].domain == rules[index].domain&&temp[i].range==rules[index].range)
		{
			strcpy(str, "(");
			strcat(str, temp[i].expr);
			strcat(strcat(str, "+"), rules[index].expr);
			strcat(str, ")");
			delete []temp[i].expr;
			temp[i].expr = new char[strlen(str)+1];
			strcpy(temp[i].expr, str);
			//tempCount don't increment!
			return;
		}	
	}
	//if not found, copy
	copyFunction(&temp[tempCount], &rules[index]);
	tempCount++;
}


void Reduce::readRules(const char* fileName)
{
	ifstream f;
	int flag;
	char str[256];
	size = 0;
	f.open(fileName, ios::in);
	while (!f.eof())
	{
		f>>rules[size].domain>>str>>rules[size].range>>flag;
		rules[size].isStart = flag==1?true:false;
		f>>flag;
		rules[size].isFinal = flag==1?true:false;
		rules[size].expr = new char[strlen(str)+1];
		strcpy(rules[size].expr, str);
		size++;
	}
}



void Reduce::clearState()
{
	//those three type arrow should be cleared
	for (int i=0; i<inCount; i++)
	{
		delete [] rules[buffer[InBound][i]].expr;
	}
	for (i=0; i<outCount; i++)
	{
		delete [] rules[buffer[OutBound][i]].expr;
	}
	for (i =0; i<loopCount; i++)
	{
		delete [] rules[buffer[IsLoop][i]].expr;
	}
}

void Reduce::copyFunction(Function* target, Function* source, bool toClear)
{
	target->domain = source->domain;
	target->range = source->range;
	if (toClear)
	{
		delete [] target->expr;
	}
	target->expr = new char[strlen(source->expr)+1];
	strcpy(target->expr, source->expr);
	target->isFinal = source->isFinal;
	target->isStart = source->isStart;
}

void Reduce::moveRule(int state)
{
	for (int i=0; i<size; i++)
	{
		//if not affected
		if (rules[i].domain!=state&&rules[i].range!=state)
		{
			addFunction(i);		
		}
	}
	clearState();
	//copy back from temp to rules, in order for next
	for (i=0; i<tempCount; i++)
	{
		copyFunction(&rules[i], &temp[i]);
	}
	size = tempCount;
}

void Reduce::setParam(Function** theRule, int theSize)
{
	size = theSize;
	for (int i=0; i<size; i++)
	{
		copyFunction(&rules[i], theRule[i]);
	}	
}

void Reduce::replace(int state)
{
	generate();
	moveRule(state);
}

void Reduce::generate()
{
	tempCount=0;
	char str[256];
	int in, out, loop;

	//this handles the trap situation
	if (outCount==0)
	{
		return;
	}
	
	for (int i=0; i<inCount; i++)
	{		
		for (int j=0; j<outCount; j++)
		{
			in = buffer[InBound][i];
			out = buffer[OutBound][j];

			temp[tempCount].domain = rules[in].domain;
			temp[tempCount].isStart = rules[in].isStart;//the start from in
			temp[tempCount].range = rules[out].range;
			temp[tempCount].isFinal = rules[out].isFinal;//final from out
			strcpy(str, rules[in].expr);
			for (int k=0; k<loopCount; k++)
			{
				loop = buffer[IsLoop][k];
				if (k==0)
				{
					strcat(str, ".star");
					strcat(str, "(");
				}
				strcat(strcat(str, rules[loop].expr), "+");
				if (k==loopCount-1)
				{
					//replace '+' with ')'
					str[strlen(str)-1]=')';
				}
			}
			strcat(str, ".");
			strcat(str, rules[out].expr);
			
			temp[tempCount].expr = new char[strlen(str)+1];
			strcpy(temp[tempCount].expr, str);
			//finished one record
			tempCount++;
		}
	}
}


bool Reduce::findStates(int state)
{
	for (int i=0; i<size; i++)
	{		
		if (rules[i].domain==state||rules[i].range==state)
		{
			return true;
		}
	}
	return false;
}

bool Reduce::findShortCut(int state)
{
	inCount = outCount = loopCount = 0;
	for (int i=0; i<size; i++)
	{
		//exclusiveOr and this is a loop
		if (rules[i].domain==state&&rules[i].range==state)
		{
			//the loop 
			if (rules[i].isFinal||rules[i].isStart)
			{
				return false;
			}
			buffer[IsLoop][loopCount]=i;
			loopCount++;
		}
		else
		{			
			if (rules[i].domain==state)
			{
				//if it is start, cannot remove it.
				if (rules[i].isStart)
				{
					return false;
				}
				buffer[OutBound][outCount]=i;
				outCount++;
			}
			if (rules[i].range==state)
			{
				//if it is final, we cannot remove it
				if (rules[i].isFinal)
				{
					return false;
				}			
				buffer[InBound][inCount]=i;
				inCount++;
			}
		}
	}
	//must both have in and out, however, if there is no outBound, it
	//means it is a trap, should be removed immediately
	if (inCount==0)
	{
		cout<<"Error! unaccessable node!\n";
		return false;
	}
	else
	{
		return true;
	}
}
			
void Reduce::getExpr()
{
	char str[4][256];	
	str[0][0]='\0';
	str[1][0]='\0';
	str[2][0]='\0';
	str[3][0]='\0';
	int startLoop =0, startFinal=0, finalStart=0, finalLoop=0, singleLoop=0;
	for (int i=0; i<size; i++)
	{
		//loop at start, not at final
		if (rules[i].isStart&&!rules[i].isFinal&&rules[i].domain==rules[i].range)
		{
			//for the first one, copy "star"
			if (startLoop==0)
			{
				strcat(strcpy(str[StartLoop], "star("), rules[i].expr);
			}
			else
			{
				//for the second or more,
				strcat(str[StartLoop], rules[i].expr);
			}
			strcat(str[StartLoop], "+");
			startLoop++;
		}
		//there is two directions: one from start to final
		//the other is from final to start!!!
		//this is from start to final!
		//I wonder if there is two of such string? should not???
		//yes, it maybe! for an example, there is originally two 
		//parallel arrows between start and final and no operation has
		//ever been taken, so no merge performed.
		if (rules[i].domain!=rules[i].range&&rules[i].isStart&&rules[i].isFinal)
		{
			//the very first expr, use copy
			if (startFinal==0)
			{
				strcpy(str[StartFinal], rules[i].expr);
			}
			else
			{
				//all other use concat
				strcat(str[StartFinal], rules[i].expr);
			}
			strcat(str[StartFinal], "+");
			startFinal++;
		}
		//the arrow from final to start
		if (rules[i].domain!=rules[i].range&&!rules[i].isStart&&!rules[i].isFinal)
		{
			if (finalStart==0)
			{
				strcpy(str[FinalStart], rules[i].expr);
			}
			else
			{
				strcat(str[FinalStart], rules[i].expr);
			}
			strcat(str[FinalStart], "+");
			finalStart++;
		}
		//final loop, but not start loop
		if (rules[i].isFinal&&!rules[i].isStart&&rules[i].domain==rules[i].range)
		{
			//for the first one, copy "star"
			if (finalLoop==0)
			{
				strcat(strcpy(str[FinalLoop], "star("), rules[i].expr);
			}
			else
			{
				//for the second or more,
				strcat(str[FinalLoop], rules[i].expr);
			}
			strcat(str[FinalLoop], "+");
			finalLoop++;
		}
		//here is the spcial case: start and final are the same
		if (rules[i].isFinal&&rules[i].isStart&&rules[i].domain==rules[i].range)
		{
			//for the first one, copy "star"
			if (singleLoop==0)
			{
				//we "borrow" this string to express
				strcat(strcpy(str[StartLoop], "star("), rules[i].expr);
			}
			else
			{
				//for the second or more,
				strcat(str[StartLoop], rules[i].expr);
			}
			strcat(str[StartLoop], "+");
			singleLoop++;
		}
	}
	//overwrite "+" with ")"
	if (singleLoop>0)
	{
		str[StartLoop][strlen(str[StartLoop])-1]=')';
		cout<<str[StartLoop]<<endl;
	}
	else
	{
		if (startLoop>0)
		{
			str[StartLoop][strlen(str[StartLoop])-1]=')';
		}
		if (startFinal>0)
		{
			//must not have more '+'! so write \0 to end
			str[StartFinal][strlen(str[StartFinal])-1]='\0';
		}
		if (finalStart>0)
		{
			str[FinalStart][strlen(str[FinalStart])-1]='\0';
		}
		if (finalLoop>0)
		{
			str[FinalLoop][strlen(str[FinalLoop])-1]=')';
		}
		cout<<getFinalExpr(str)<<endl;
	}
}

void Reduce::beginReduce()
{
	int state=0;
	while (findStates(state))
	{
		if (findShortCut(state))
		{
			replace(state);
		}
		state++;
	}
	getExpr();
}




	
	
	

The result is like following:
star(star((b.star(b).a.b+a))).b.star(b).a.a.star(.star(star((b.star(b).a.b+a))).b.star(b).a.a.star((
a+b)))
Press any key to continue

กก

**explanation**

1. I used "star" to represent "keleen star" as I cannot print the exponent-like "*"'s.

2. I used "." to show it is a concatenation for better reading.

3. "+" sign still means the "union".

4. Here is the input file which is the DFA for a language over {a,b} such that "baa" is its substring.

5. The input file format is like following:

    Domain   Expression   Range  isStart  isFinal

6. Domain and Range means that the transition functions is starting from Domain and ending at Range which is

    the index of states. The expression is the label. When Domain is the initial state, isState is true,

    indicated by "1". Otherwise "0". When Range is the final state, isFinal is "1". Otherwise "0".

7. The input file is named "rules.txt" and stored in place of "c:\".

กก
	

			


                                 back.gif (341 bytes)       up.gif (335 bytes)         next.gif (337 bytes)