Geeks With Blogs
El Grego BizTalk blog
This post is about Finite State Machines, the Flying Spaghetti Monster is for next time.

For one of the EAI projects I'm working on I needed a specific StreamReader that adds field-wrappers (around the fields, obviously) of a char-based stream. For example:

field1-1,field1-2;field2-1,field2-2 => "field1-1","field1-2";"field2-1","field2-2"

My first approach was the currently popular test-driven one: a default passthru read method and a bunch of tests, that of-course all initially failed. I trusted my creativeness in order to randomly identify all different types of input-string combinations:

NoFieldSeperator
NoRecordSeperator
EmptyString
FieldWithSingleWrapCharInfix
FieldWithSingleWrapCharPostfix
etc...

Inside the reader I added different if-then-else clauses to manipulate the output, until a given test succeeded. Every newly added if-clause of course potentially broke 1 or more of the previous tests. One can imagine it took quite a lot of time and way too many iterations to get this very simple parser working!

After a while I was lucky enough to get all cases working: hurray! In fact, not really :-(
I didn't have complete confidence in what I had just built and how it was built. This wouldn't work for more complex scenarios. How could I ever be sure that I didn't forget some special case? I needed a more formal approach...
 
So I modeled the reader as a finite state machine, thereby defining the actions i.e. the desired output character(s) for every input character depending on the state. It's very easy, just start drawing on a piece of paper and think about the states and the output for every input. The reader can be in the 2 following states: 'inside a field' or 'outside a field'.


I have defined 'special' input tokens that require special handling (different action) or trigger a state transition:

RD: record delimiter
FD: field delimiter
WS: whitespace
EoF: end of file
WC: wrap character
C: any other char

The output (or action) for a given input greatly depends upon the state the reader is in and if the input character triggers a transition. EoF, field- and record-delimiters trigger a state transition from insidefield to outsidefield.

I didn't put the actions on the diagram. 
For every 1 char in the input, the reader can output 0, 1 or 2 chars. If the input contains the field-wrap character inside a field, it should be doubled. This requires the ability to cache a few characters.

field1,fiel"d2; => "field1","fiel""d2"

Whitespaces could be configured to be skipped if non-significant (trimming).
Also RD or FD when 'OutsideField' could be configured to generate empty fields.
For this implementation I choose the object-oriented state design pattern.


internal abstract class StreamState
        {
            public abstract int CharInput(WrappingFsmStreamReader context, int character);
            public abstract int WhitespaceInput(WrappingFsmStreamReader context);
            public abstract int FieldDelimiterInput(WrappingFsmStreamReader context);
            public abstract int RecordDelimiterInput(WrappingFsmStreamReader context);
            public abstract int WrapInput(WrappingFsmStreamReader context);
            public abstract int EoFInput(WrappingFsmStreamReader context);
        }

Here is a sample.

So next time you need a custom StreamReader it makes sense to model it as a FSM. Define and document the states and behavior, use this inside knowledge to write the unit tests. Posted on Monday, May 25, 2009 9:55 AM BizTalk - EAI - B2B | Back to top


Comments on this post: Custom StreamReader with FSM

No comments posted yet.
Your comment:
 (will show your gravatar)


Copyright © Gregory Van de Wiele | Powered by: GeeksWithBlogs.net