The Architect´s Napkin

Software Architecture on the Back of a Napkin
posts - 69 , comments - 229 , trackbacks - 0

My Links

News

Archives

Post Categories

Image Galleries

The Incremental Architect´s Napkin - #6 - Branch flows for alternative processing

Confronted with an Entry Point into your software don´t start coding right away. Instead think about how the functionality should be structured. Into which processing steps can you partition the scope? What should be doen first, what comes next, what then, what finally. Devise a flow of data.

Think of it as an assembly line. Some raw input plus possibly some additional material is transformed into shiny output data (or some side effect fireworks).

Here is the Flow Design of the de-duplication example again:

image

That´s a simple sequential flow. Control flows along with the data. And it´s a one dimensional (1D) flow. There is just one path from start to end through the graph of processing nodes.

Such flows are common. For many functions they are sufficient to describe the steps to accomplish what´s required. And as you saw in the previous chapter they are easy to translate into code:

static void Main(string[] args)
{
    var input = Accept_string_list(args);
    var output = Deduplicate(input);
    Present_deduplicated_string_list(output);
}

Streams causing alternative flows

So much for the happy day. But what if an error occures? Input could be missing or be malformed. Sure you would not want the program to just crash with a cryptic error message.

If "graceful failure" becomes a requirement, how could it added to the current design? I suggest a preliminary processing step for validation:

image

It´s still a sequential 1D flow - but now the processing steps after validaton are optional so to speak. See the stream coming out of validation? The asterisk means, maybe (args) will flow out, maybe not. It depends on whether the command line arguments were validated correctly.

For simplicity´s sake let´s assume validation just checks if the program gets called with exactly one command line argument. If not, an error message should be printed to standard output.

This could now easily be implemented:

image

And the effect would be stunning when running the program with invalid command line parameters:

image

Look at the Entry Point closely. First notice: the flow is readily visible. Still. Even though three of the four steps now run as a continuation. Let yourself not be deceived by this. Just read at the program text from top to bottom.

Technically, though, it´s not that simple. The three last steps note just written after the first one. They are nested. They get injected. That´s what makes conditional execution possible without a (visible) control statement.

Now there are two alternative execution paths through the flow:

  1. Validate_command_line()
  2. Validate_command_line(), Accept_string_list(), Deduplicate(), Present_deduplicated_string_list()

The alternatives are signified by the stream flowing from the validation.

And of course there is a control statement deciding between alternatives. But it´s not part of the Flow Design. It´s in implementation detail of Validate_command_line(). The data flow remains free of logic even though there are alternative paths through it.

Take the indentation of the continuation as a hint for the alternative. This might look a bit strange at first, but you´ll get used to it. Or if you like find some other formatting for continuations. Just be sure to keep an eye on consistency and readability - within the limits of a textual flow representation.

Branches for explicit alternative flows

Validating the command line in this way works - but it´s not a clean solution. It´s not clean, because the SRP is violated. The validation has more than a single responsibility. It has at least two: it checks for validity (expression plus control statement) and it notifies the user (API-calls).

That´s not good. The responsibilities should be separated. One functional unit for the actual check, another for user notification.

This, though, cannot be accomplished with a 1D flow. There need to explicit branches: one for the happy day, another one for the rainy day.

image

You see, functional units can have more than one output. In fact any number of output ports is ok. As for the translation it should be obvious that in case of more than one output a translation into return is not possible. If more than one output ports is present, all should be translated into function pointers. I don´t recommend mixing return with function pointers.

This is how the translation looks like:

image

Both functions - Validate_command_line() and Present_error_message() - now have a single responsibility. And the flow in Main() is still pretty clear - at least once you have gotten used to "thinking in functions".

The two paths through the flow now are:

  1. Validate_command_line(), Present_error_message()
  2. Validate_command_line(), Accept_string_list(), Deduplicate(), Present_deduplicated_string_list()

If you have a hard time figuring this out from the code, give yourself some time. Realize how you have to re-program your brain. It´s so used to see nested calls that it´s now confused. There is nesting - but the nested code is not called first? Yes. That´s a result of some Functional Programming here. The flow translation uses functions (lambda expressions) as first class data citizens.

In case you have wondered so far, what all the fuzz about lambdas and closures was all about in C# (or Java)... Now you see what it´s useful for: to easily translate Flow Designs into code.

Yes, this looks a bit clumsy. But that´s due to C# (or Java or C++ or JavaScrip) being object oriented languages. And it´s due to a textual notation. Expressing alternatives in text is always a difficult think. In a visual notation alternatives often are put side by side. That´s not possible with current text based IDEs. So don´t blame the unusual code layout not only on Flow Design.

Finally: Let me assure you that it´s possible to get used to reading this kind of code fluently. Hundreds of developers I´ve trained over the passed years have accomplished ths feat. So can you.

Back to the problem:

Please note how the two continuations of Validate_command_line() do not hint at what´s going to happen next downstream. Their names refer to the purpose of the function, not its environment. That´s what makes the function adhere to the PoMO.

Both names make it obvious which output port of Validate_command_line() is used when. That´s not so obvious in design. When you look at the Validate command line "bubble" with its two outputs you can´t see which one belongs to which alternative.

For such a small flow that´s not really a problem. But think of more than two outputs or not mutually exclusive alternatives. So if you like annotate the Flow Design with port names. I do it like this:

image

The same you can do for input ports, if there should be more than one. Put a name next to the port prefixed with a dot. That way the name looks like a property of the functional unit.

Also notice how both outputs are streams. That´s to signify the optionality of data flowing. It´s a conceptual thing and not technically necessary.

You can translate streams to function pointers, but in C# at least you also could choose yield return with an iterator return type (IEnumerable). Or if output data is not streamed you can still translate the output port to a function pointer.

Still, though, I guess designs are easier to understand if you put in the asterisk. Don´t think of streams as a big deal. It´s just as if functions could have an optional return value. (Which would be different from returning an option value like in F#.)

Why is there not flowing an error message out from validation? That´s just a design choice. In this case I decided against it, since there is only one error case. In other situations an error text could flow from several different validation steps to a single error reporting functional unit. Or just an error case identifier (enum). Or even an exception; instead of throwing it right away the validation could leave the decision what to do to some other functional unit.

Flow Design as language creation

As you see, the lack of control statements in Flow Design does not mean single flow of data. Flows can have many branches - although from a certain point on this becomes unwieldy. Spaghetti flows are a real danger like spaghetti code.

That´s also the reason why I would like to caution you to introduce circles into your flow graphs. Keep them free of loops. Only very rarely there should be a need for letting flow data back to an upstream functional unit.

Likewise don´t try to simulate control flow. Branching being possible does not mean, you should name your processing steps "if" or "while". This would lower the level of abstraction of your design. It would defy its purpose.

Flow Design is about creating a Domain Specific Language (DSL) en passant. It´s supposed to be declarative. It´s supposed to be on a higher level of abstraction than your programming language. Take the Flow Design notation as a universal syntax to declaratively describe solutions in arbitrary domains.

How such flows are executed should be of no/little concern to you. It´s like writing a christmas gift list. Your daughter wants a pony, your son a real racing car? They don´t care how Santa Clause manages to fulfill their wishes.

Likewise at design time trust there will be a way to implement each processing step. Later. And the more fine grained they are the easier it will be. But until then assume they are already present and functioning. Any functional unit you like. On any level of abstraction. It´s like wielding a magic wand, e.g. "Let there be a functional unit for command line parameter validation!"

There might be one or many control statements needed to implement a functional unit. But let that not leak into your design; don´t anticipate so much. Instead label your functional units with a domain specific phrase. One that describes what is happening, not how. That makes for a declarative DSL consisting of many words and phrases that are descriptive - and even re-usable.

Generalization: 2-dimensional flows

The result of Flow Design then is a flow with possibly many alternatives. A flow that branches like a river does. I call that a 2-dimensional flow because it´s not just one sequence of processing steps (1D), but many, in parallel (2D).

image

2D flows are data flows like 1D flows. There is nothing new to them in terms of parallel processing. Whether two processing steps are wired-up after one another or as alternatives does not require them to be implemented using multiple threads. It´s possible to do that. Flow Design makes that easier because its data flow is oblivious to control flow.

So don´t rush to find learn about Actor frameworks or async/await in C# because you want to apply Flow Design to your problems. Such technologies are orthogonal to Flow Design. For a start just rely on ordinary functions to implement processing steps. That does not diminish the usefulness of functional design.

What does 2-dimensionality mean? It means, data can flow along alternative paths through the network of nodes. Here are the paths for the above 2D flow:

image

Which does not mean, it´s one or the other. Data can flow along many paths at the same time. In any case conceptually at least, but also (almost) truely at runtime, if you choose to employ multi-threading of some sort. It need not be "either this path or that", it can be "this path as well as that".

But don´t let that confuse you right now. Without a tangible problem demanding for that kind of sophisticated flow design it´s pretty abstract musings. In practice this is largely no problem. Most flows are pretty straightforward.

Just keep in mind: this is data flow, not control flow. That means it´s unidirectional data exchange between independent functional units who don´t know anything about each other. They just happen to offer certain behavior which expresses itself as producing certain output or some side effect upon certain input.

Similar flows flowing back together

Data flows cannot only be split into branches, they can also flow back into each other or be joined.

Think of the famous Fizz Buzz kata: Numbers in a range, e.g. 1..100 is to be output in a special way. If can be devided by 3 "Fizz" should be written, if it´s devidable by 5 "Buzz" should be written, and if it can devided by 3 and 5 "FizzBuzz" should be written. Any other number if output as is.

Usually this kata is used to practice TDD. But of course it can also be tackled with Flow Design, although it´s scope is very narrow and the solution thus might feel a little clumsy. Basically it´s a small algorithmic problem. So Flow Design is almost overkill.

On the other hand it´s perfect to illustrate branching and flowing back.

The task is, to implement Fizz Buzz as a function like this: void FizzBuzz(int first, int last). For a given range of numbers the translations should be printed to standard output.

What´s to be done? What are the building blocks for this functionality? Here´s the result of my brainstorming:

  • Print numbers or their translations.
  • Translate number
    • First classify number
    • Then convert it
  • Number generation
  • Check range. If it´s an invalid range, throw an exception.

Notice how fine grained these processing steps are. Before I start coding I´m always eager to determine the different responsibilities. That´s one of the tasks of any design: separate aspects, responsibilities, concerns.

Printing numbers certainly is different from all else. It´s about calling an API, it´s communication with the environment, whereas the other processing steps belong to the Fizz Buzz domain.

Validation also is different from translation, isn´t it? Translation rules could change. That should no affect the validation function.

Also classification rules chould change. That should not affect the functions for converting a certain class of numbers. As well as the other way around.

"Seeing responsibilities" is one of the "arts" of software development. It can be trained, but except for some hard and fast rules in the end it remains a quite creative act. Be prepared to revise your decisions. Also be prepared for dissent in your team. But with regular reflection you´ll master this art.

Here now my Flow Design for the above bullet points:

image

Let me point out a couple of things:

  • Note that multiple values can flow as data at once, e.g. (first, last). That´s tuples. Passing them in as input is easy: they map to a list of formal function parameters. But how generate them as output? There are various options depending on the programming language you use.
  • Streams are used again to signify optional output. For each number data on only one port will flow out of classification.
  • The streams flowing into the translation steps produce an output stream. That´s the right thing to do here. In other scenarios, though, an input stream could result in just one output value. Think of aggregation.

Like I said, for the problem at hand this might be a bit overkill. A quite elaborate flow for such simple functionality. On the other hand that´s perfect: The problem domain is easy to undestand so we can focus on the features of Flow Design and their translation into code.

Here you see how it´s possible to have many output ports on a processing step and how many branches can flow back into one.

The visual notation makes that very easy. But how does it look in code? Will it still be readily understandable?

Let´s start with some of the processing steps:

image

Each of the steps is very small, very focused, very easy to understand. I think, that´s a good thing. Functions should be small, shouldn´t they? Some say no more than 10 LOC, others say 40 LOC or "a screenfull of code". In any case Flow Design very naturally leads to small functions. Don´t wait for refactoring to downsize your functions. Do it right from the beginning. You save yourself quite some refactoring trouble.

My favorite function is Classify_number(), you know. Because it´s so different from the usual Fizz Buzz implementations. Here it truely has a single responsibility: It´s the place where numbers are analyzed. It´s where there Fizz Buzz rule is located, which says, numbers must not all be treated the same.

Fizz Buzz originally is a drinking game. Who fails at "counting" correctly has drink some more - which makes it even harder to "count". The main mental effort goes into checking if a number needs translation. It´s about math - which is not easy for everyone even when sober ;-) And right this checking is represented by Classify_number(). No number generation, no translation, just checking.

That´s also the reason, why I did not bother to apply the Single Level of Abstraction (SLA) principle. I did not refactor the conditions out into their own functions but left them in there, even with a small duplication. Still the function can be tested very, very easily.

And now for the main function of the solution where the process is assembled from the functional units:

image

This might look a bit strange to you. But try to see through this. Try to see how systematic this translation is. And in the end you´ll see how the data flows even in the code. From that you then can re-generate the diagram. The code is the design. And when you cange the code according to the PoMO it will stay in sync with the design because it is just a "serialization" of a flow.

If you look closely, though, you might spot a seeming deviation from the design. Print() is repeated in every branch instead of calling the function just once. But in fact it´s not a deviation but a detail of the way several streams need to be joined back together into one. See it not as several calls of a function, but as a single point. It´s just 1 name, 1 function and thus represents the 1 point circled in the Flow Design.

Joining dissimilar flows

Here´s another scenario where branching helps - but how those branches flow back together is different.

The task is to write a function that formats CSV data. Its signature looks like this: string FormatCsv(string csv).

The input data are CSV records, e.g.

Name;Age;City
Peter;26;Hamburg
Paul;45;London
Mary;38;Copenhagen

And the output is supposed to look like this:

Name |Age|City
-----+---+----------
Peter|26 |Hamburg
Paul |45 |London
Mary |38 |Copenhagen

The function generates an ASCII table from the raw data. The header is separated from the data records. And the columns are spaced to accomodate the longest value in either header or data records.

What are the aspects, the features of this functionality?

  • Determine column width
  • Parse input
  • Format header
  • Format data records - which should work like formatting the header
  • Format separator - which looks quite different from formatted data
  • Build the whole table from the formatted data

The order of these processing steps is simple. And as it turns out, some processing can be done in parallel:

image

Once the column widths have been determined, formatting the data and formatting the separator is independent of each other. That´s why I branched the flow and put the Format... processing steps in parallel.

Notice the asterisk in (csvRecord*) or (colWidth*). It denotes a list of values in an abstract manner. Whether you implement the list as an array or some list type or IEnumerable in .NET is of no concern to the design. Compare this to the asterisk outside the brackets denoting a stream of single values: (int*) stands for a list (when data flows it contains multiple values), (int)* stands for a stream (data flows multiple times containing a single value).

Formatting the separator just takes in the column width values. But formatting the records also takes in the records. Notice the "|" before the data description. It means "the following is what really flows into the next functional unit". It´s used in cases where upstream different data is output than downstream requires as input.

Determine col width outputs (colWidth*), but Format records requires (csvRecord, colWidth). That´s expressed by (colWidth) | (csvRecord,colWidth*) on the arrow pointing from Determine... to Format records.

This means, a flow defines a context. Within this context data can be "re-used". In this case the csvRecord* coming out of Parse is used again for formatting. (In code this is easy to achieve if a flow is put together in a single function. Then data can be assigned to local variables.)

Most importantly, though, this Flow Design sports a join. The join is a special functional unit. It takes n input flows and produces 1 output flow. The data of the output is a tuple combining data from all inputs.

The join waits for data to arrive on all inputs. And it outputs a tuple, whenever an input changes. In this case, though, once output was generated, the join clears its inputs. So for the next output new input has to arrive on both input ports. That´s called an auto-reset join.1

Sounds complicated? Maybe. But in the end it´s real simple. As you see in the implementation a join - even though being a functional unit of its own in the flow - does not require an extra function:

image

A simple function call with n parameters will do most often to bring together several branches - at least as long as you don´t resort to real parallel processing in those branches.

That´s why sometimes I simplify the join like this:

image

That way it does no longer look so "massive". It´s more a part of the downstream processing step.

For the remaining code of the CSV formatter see the implementation in the accompanying GitHub repository.

In closing

I hope I was able to instill some faith in you that Flow Design is rich enough to model solutions to real problems. Even though it´s not a full blown programming language it allows you to express "processes" of all sorts to deliver on the all the functional and many quality requirements of your customers.

1D and 2D flows are declarative expressions of "how things work" once control enters a software through an Entry Point.

Mutually oblivious functional units are all you need to avoid many of the pitfalls of programming usually leading to dirty code.

But wait! There´s more! ;-) You sure want to know how to scale those flows to build arbitrarily large processes.


  1. You might think, if there is an auto-reset join, there could be a manual-reset join, too. And you´re right. So far, though, I´ve found that to be of rare use. That´s why I´m not going into detail on that here.

Print | posted on Saturday, August 30, 2014 11:26 AM | Filed Under [ The Incremental Architect´s Napkin ]

Feedback

Gravatar

# re: The Incremental Architect´s Napkin - #6 - Branch flows for alternative processing

When looking at your Examples is see all Functions are only using simple Types for their Parameters. What about using complex Types (Classes with some Properties) for simple Datatransport, is this desireable? Or does is simply indicate this functional Unit is still to big? Maybe it's also violating PoMO?
9/22/2014 3:20 PM | Reinhard
Gravatar

# re: The Incremental Architect´s Napkin - #6 - Branch flows for alternative processing

Data can be any type you like. We´re not talking DTOs here. This is no remote communication. However... classes made for containing data (data structures) should be design in a specific way. I´ll cover that in a future article.
9/22/2014 7:05 PM | Ralf Westphal
Post A Comment
Title:
Name:
Email:
Comment:
Verification:
 

Powered by: